Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Similar documents
Notes on Mathematical Expectations and Classes of Distributions Introduction to Econometric Theory Econ. 770

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Economics 583: Econometric Theory I A Primer on Asymptotics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Chapter 1. GMM: Basic Concepts

A Course on Advanced Econometrics

Economics 620, Lecture 8: Asymptotics I

Estimation, Inference, and Hypothesis Testing

Function Approximation

Regression #3: Properties of OLS Estimator

STAT 7032 Probability. Wlodek Bryc

Stat 5101 Lecture Notes

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011

Economics 620, Lecture 20: Generalized Method of Moment (GMM)

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Limiting Distributions

A General Overview of Parametric Estimation and Inference Techniques.

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

1 Appendix A: Matrix Algebra

Limiting Distributions

The properties of L p -GMM estimators

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

V. Properties of estimators {Parts C, D & E in this file}

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Introduction to the Mathematical and Statistical Foundations of Econometrics Herman J. Bierens Pennsylvania State University

11. Bootstrap Methods

Robust Estimation and Inference for Extremal Dependence in Time Series. Appendix C: Omitted Proofs and Supporting Lemmata

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Probability for Statistics and Machine Learning

Finite Population Sampling and Inference

Parametric Inference on Strong Dependence

Regression #4: Properties of OLS Estimator (Part 2)

STAT 7032 Probability Spring Wlodek Bryc

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

6.1 Moment Generating and Characteristic Functions

Introduction: structural econometrics. Jean-Marc Robin

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

δ -method and M-estimation

Product measure and Fubini s theorem

Problem set 1 - Solutions

Quantitative Techniques - Lecture 8: Estimation

Sampling Distributions and Asymptotics Part II

STATISTICS SYLLABUS UNIT I

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Can we do statistical inference in a non-asymptotic way? 1

Graduate Econometrics I: Maximum Likelihood I

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Economics 241B Estimation with Instruments

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

SIMILAR-ON-THE-BOUNDARY TESTS FOR MOMENT INEQUALITIES EXIST, BUT HAVE POOR POWER. Donald W. K. Andrews. August 2011 Revised March 2012

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

A Primer on Asymptotics

ECON 616: Lecture 1: Time Series Basics

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

The main results about probability measures are the following two facts:

Chapter 2. Dynamic panel data models

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Econ 424 Time Series Concepts

Econometrics I, Estimation

STA205 Probability: Week 8 R. Wolpert

The Central Limit Theorem: More of the Story

Markov-Switching Models with Endogenous Explanatory Variables. Chang-Jin Kim 1

Business Economics BUSINESS ECONOMICS. PAPER No. : 8, FUNDAMENTALS OF ECONOMETRICS MODULE No. : 3, GAUSS MARKOV THEOREM

1 Correlation between an independent variable and the error

ECO Class 6 Nonparametric Econometrics

Stochastic Convergence, Delta Method & Moment Estimators

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Statistical inference


Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Measuring robustness

1. (Regular) Exponential Family

GMM estimation of spatial panels

Introduction to Machine Learning CMU-10701

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Stochastic Processes

Economics 620, Lecture 18: Nonlinear Models

COMP2610/COMP Information Theory

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Statistical Data Analysis

STAT 200C: High-dimensional Statistics

BTRY 4090: Spring 2009 Theory of Statistics

1. The Multivariate Classical Linear Regression Model

13 Endogeneity and Nonparametric IV

Lecture Notes on Measurement Error

Statistics and Econometrics I

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Chapter 3. Point Estimation. 3.1 Introduction

Regression and Statistical Inference

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Asymptotic Statistics-VI. Changliang Zou

ECE531 Lecture 10b: Maximum Likelihood Estimation

Stochastic Processes (Master degree in Engineering) Franco Flandoli

Transcription:

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770 Jonathan B. Hill Dept. of Economics University of North Carolina - Chapel Hill November 9, 0 Introduction Let ( F ) be a probability space. Throughout is a parameter of interest like the mean, variance, correlation, or distribution parameters like Poisson, Binomial, or exponential. Throughout f^ g is a sequence of estimators of based on a sample of data f g = with sample size. Assume ^ is F-measurable for any. Unless otherwise noted, assume the 0 have the same mean and variance:» ( ). If appropriate, we may have a bivariate sample f g = where» ( ) and» ( ). Examples include the sample mean, variance, or correlation: Sample Mean : ¹ := X = Sample Variance # : := Sample Variance # : ^ := X ( ) = X ¹ = Sample Correlation : ^ := P = ¹ ¹ ^ ^ Similarly, we may estimate a probability by using a sample relative frequency: ^ () = X ( ) the sample percentage of =

Notice ^ () estimates ( ). We will look at estimator properties: what ^ is on average for any sample size; and what ^ becomes as the sample size grows. In every case above the estimator is a variant of a straight average (e.g. P = ¹ ¹ is a straight average of ¹ ¹ ), or a function of a straight average (e.g. ^ := ( P = ¹ ), the square root of the average ¹ ). We therefore pay particular attention to the sample mean. Unbiasedness Defn. We say ^ is an unbiased estimator of if [^ ] =. De ne bias as ³^ B := [^ ] An unbiased estimator has zero bias: B(^ ) = 0. If we had an in nite number of samples of size, then the average estimate ^ across all samples would be. An asymptotically unbiased estimator satis es B(^ )! 0 as!. Claim (Weighted Average): Let have a common mean := [ ]. Then the weighted average ^ := P P = is an unbiased estimator of := [] if = =. " X # = = X [ ] = = X = QED. Corollary (Straight Average): The sample mean ¹ := P = is a weighted average with at or uniform weights = hence trivially P = = hence [ ] ¹ = The problem then arises as to which weighted average P = may be preferred in practice since any with unit summed weights is unbiased. We will discuss below the concept of e ciency below, but the minimum mean-squared-error unbiased estimator has uniform weights if» ( ). That is: Claim (Sample Mean is Best): Let» ( ). Then ¹ is the best linear unbiased estimator of (i.e. it is BLUE). The Lagrange is = We want to solve à X! min subject to = X = = à X! à L ( ) := + =! X =

where by independence ( P = ) = P =, hence X L ( ) := Ã + =! X = The rst order conditions are L ( ) = = 0 and L ( ) = X = 0 = Therefore = ( ) is a constant that sums to P = =. Write = ( ) =:. Since P = = P = = = it follows = =. QED. Remark: As in many cases here and below, independence can be substituted for uncorrelatedness since the same proof applies: [ ] = [ ][ ] for all 6=. We can also substitute uncorrelatedness with a condition that restricts the total correlation across all and for 6=, but such generality is typically only exploited in time series settings (where is at a di erent time period). Claim (Sample Variance): Let» ( ). The estimator is unbiased and ^ is negatively biased but asymptotically unbiased. Notice = ^ = X ¹ X = + ¹ = = = X ( ) + X ¹ X + ( ) ¹ = = = = X ( ) + ¹ ¹ X ( ) = = = X ( ) + ¹ ¹ ¹ = = X ( ) ¹ = By the iid assumption and the fact that ¹ is unbiased Ã! ¹ X = = X ( ) = = = = Further, by de nition := [( ) ] hence " # X ( ) = X h ( ) i = X = = 3 = =

Therefore = ^ = = This implies each claim: = ( is unbiased), ^ = ( ) (^ is negatively biased), and ^ = ( )! (^ is asymptotically unbiased). QED. Example: We simulate 00 samples of» (754) with sample size = 0. In Figure we plot ¹ for each sample. The simulation average of all ¹ is 74.98394 and the simulation variance of all ¹ is 6595. In Figure we plot ^ = P = for each sample with weights = P =. The simulation average of all ^ is 74.98795 and the simulation variance of all ^ is.30940776. Thus, both display the same property of unbiasedness, but ¹ exhibits less dispersion across samples Figure : ¹ Figure : ^ 3 Convergence in Mean-Square or L -Convergence Defn. We say ^ R converges to in mean-square if We also write If ^ is unbiased for then MSE(^ ) := ³^! 0 ^! and ^! in mean-square. MSE(^ ) = ³^ h^ i = h^ i Convergence in mean-square certainly does not require unbiasedness. In the, MSE is i i MSE(^ ) = ³^ = ³^ h^ + h^ i ³ i i ³ i = ³^ h^ + h^ + ³^ h^ h^ i ³ i = ³^ h^ + h^ 4

i i i since h^ is just a constant and ³^ h^ = [^ ] h^ = 0. Hence MSE is the variance plus bias squared: i ³ i i ³ ³^ MSE(^ ) = ³^ h^ + h^ = h^ + B If ^ R then we write MSE(^ ) := ³^ ³^ 0! 0 hence component wise convergence. We may similarly write convergence in -norm ³^ ³^ 0 0 X! 0 where kk := @ X = = or convergence in matrix (spectral) norm: ³^ ³^ 0! 0 where kk is the largest eigenvalue of. A Both imply convergence with respect to each element ³^! 0. Defn. We say ^ R has the property of -convergence, or convergence in -norm, to if for 0 ^! 0 Clearly -convergence and mean-square convergence are equivalent. Claim (Sample Mean): Let» ( ). Then ¹! in mean square. ( ¹ ) = [ ¹ ] =! 0 QED. We only require uncorrelatedness since [ ¹ ] = still holds. Claim (Sample Mean): mean square. Let» ( ) be uncorrelated. Then ¹! in ( ¹ ) = [ ¹ ] =! 0 QED. In fact, we only need all cross covariances to not be too large as the sample size grows. Claim (Sample Mean): Let» ( ) satisfy P ( )! 0. Then ¹! in mean square. ( ¹ ) = [ ¹ ] = + P ( )! 0 QED. Remark: In micro-economic contexts involving cross-sectional data this type of correlatedness is evidently rarely or never entertained. Typically we assume the 0 are uncorrelated. It is, however, profoundly popular in macroeconomic and 5

nance contexts where data are time series. A very large class of time series random variables satis es both ( ) 6= 0 8 6= and P ( )! 0, and therefore exhibits ¹! in mean square. If» ( ) then ¹! in -norm for any (] but proving the result for non-integer ( ) is quite a bit more di cult. There are many types of "maximal inequalities", however, that can be used to prove X for ( ) where 0 is a nite constant. = Claim (Sample Mean): any ( ). X = since QED. Let» ( ) be iid. Then ¹! in -norm for = X f g = =! 0 Example: We simulate» (7400) with sample sizes = 555 000. In Figure 3 we plot ¹ and [ ] ¹ = 400 over sample size. Notice the high volatility for small. Figure 3: ¹ and [ ¹] 4 Convergence in Probability : WLLN Defn. We say ^ converges in probability to if lim ³ ^ = 0 8 0 ()! We variously write ^! and ^! 6

and we say ^ is a consistent estimator of. Since probability convergence is convergence in the sequence f(j^ j )g =, by the de nition of a limit it follows for every 0 there exists 0 such that ³ ^ 8 That is, for a large enough sample size ^ is guaranteed to be as close to as we choose (i.e. the ) with as a great a probability as we choose (i.e. ). Claim (Law of Large Numbers = LLN): If» ( ) then ¹!. By Chebyshev s inequality and independence, for any 0 ¹ ¹ =! 0 QED Remark : We call this a Weak Law of Large Numbers [WLLN] since convergence is in probability. A Strong LLN based on a stronger form of convergence is given below. Remark : We only need uncorrelatedness to get ¹ =! 0. The WLLN, however extends to many forms of dependent random variables. Remark 3: In the iid case we only need j j, although the proof is substantially more complicated. Even for non-iid data we typically only need j j + for in nitessimal 0 (pay close attention to scholarly articles you read, and to your own assumptions: usually far stronger assumptions are imposed than are actually required). The weighted average P = is also consistent as long as the weights decay with the sample size. Thus we write the weight as. Claim: If» ( ) then P =! if P = = and P =! 0. By Chebyshev s inequality, independence and P = =, for any 0! Ã X X Ã X Ã! = f g = which proves the claim. QED. = X = = = h( ) i X =! 0 An example is ¹ with =, but also the weights = P Figure. = =! used in Example: We simulate» (75 0) with sample sizes = 55 5 0000. In Figures 4 and 5 we plot ¹ and ^ = P = over sample size. Notice the high volatility for small. 7

Figure 4 : ¹ Figure 5 : ^ 7 9 7 9 7 8 7 8 7 7 7 7 7 6 7 6 7 5 7 5 7 4 7 4 7 3 7 3 7 7 7 7 7 0 5 0 0 5 0 0 5 3 0 0 5 4 0 0 5 5 0 0 5 6 0 0 5 7 0 0 5 8 0 0 5 9 0 0 5 Sample Size n 7 0 5 0 0 5 0 0 5 3 0 0 5 4 0 0 5 5 0 0 5 6 0 0 5 7 0 0 5 8 0 0 5 9 0 0 5 Sample Size n Claim (Slutsky Theorem): Let ^ R. If ^! and : R! R is continuous (except possibly with countably many discontinuity points) then (^ )! (). Corollary: Let ^!, =. Then ^ ^!, ^ ^!, and if 6= 0 and liminf! j^ j 0 then ^ ^!. Claim: If» ( ) and [ 4] then!. Note = X = ¹ = X ( ) ¹ By LLN ¹!, therefore by the Slutsky Theorem ¹! 0. By [ 4 ] it follows ( ) is iid with a nite variance, hence it satis es the LLN: P = ( )! [( ) ] =. QED. Claim: = If» ( ) and [ j then the sample correlation ^! the population correlation. Example: We simulate» (7 400) and» (0900) and construct = 43 + +. The true correlation is = [ ] [ ] [ ] = 43 [ ] + 7 ( 43 + 7) 0 p 4 400 + 900 = 43 7 + 400 + 7 7( 43 + 7) 0 p 4 400 + 900 = 8 We estimate correlation for samples with size = 5 55 0000. Figure 6 demonstrates consistency and therefore the Slutsky Theorem. 8

.00 Figure 6: Correlation 0.90 0.80 0.70 0.60 0.50 5 005 005 3005 4005 5005 6005 7005 8005 9005 Sample Size n 5 Almost Sure Convergence : SLLN Defn. We say ^ converges almost surely to if ³ ^ = = lim! This is identical to µ lim max ^ = 0 8 0! We variously write and we say ^ is strongly consistent for. We have the following relationships. Claim: ^! and ^! ^! implies ^! ;. ^! implies ^!. (j^ j ) (^ ) by Chebyshev s inequality. If (^ )! 0 (i.e. ^! ) then (j^ j )! 0 where 0 is arbitrary. Therefore ^!. (j^ j ) (sup j^ j ) since sup j^ j j^ j. Therefore if (sup j^ j )! 0 8 0 (i.e. ^! ) then (j^ j )! 0 8 0 (i.e. ^! ). QED. If ^ is bounded wp then ^! if and only if [^ ]! which is asymptotic unbiasedness (see Bierens). By the Slutsky Theorem ^! implies (^ )! 0 hence [(^ ) ]! 0: convergence in probability implies convergence in mean-square. This proves the following (and gives almost sure convergence as the "strongest" form: the one that implies all the rest). Claim (a.s. =) i.p. =) m.s.): Let ^ be bounded wp: (j^ j ) = 9

for nite 0. Then ^! implies ^! implies asymptotic unbiasedness and ^!. Claim (Strong Law of Large Numbers = SLLN):!. Remark: Example: If» ( ) then ¹ The Slutsky Theorem carries over to strong convergence. Let» ( ) and de ne ^ := + ¹ Then (j^ j ) =. Moreover, under the iid assumption ¹! by the SLLN, hence by the Slutsky Theorem ^! + Therefore and [^ ]! = ( + ) and ^! + ³^! 0 6 Convergence in Distribution : CLT Defn. We say ^ converges in distribution to a distribution, or to a random variable with distribution, if lim ³^ = () for every on the support.! Thus, while ^ may itself not be distributed, asymptotically it is. We write ^! or ^! where». The notation ^! is a bit awkward, because characterizes in nitely many random variables. We are therefore saying there is some random draw from that ^ is becoming. Which random draw is not speci ed. 6. Central Limit Theorem By far the most famous result concerns the sample mean ¹. Convergence of some estimator ^ in a monumentally large number of cases reduces to convergence of a sample mean of something, call it. This carries over to the sample correlation, regression model estimation methods like Ordinary Least Squares, GMM, and Maximum Likelihood, as well as non-parametric estimation, and on and on. 0

As usual, we limit ourselves to the iid case. The following substantially carries over to non-iid data, and based on a rarely cited obscure fact does not even require a nite variance (I challenge you to nd a proof of this, or to ever discover any econometrics textbook that accurately states this). Claim (Central Limit Theorem = CLT): := p ¹! (0 ) If» ( ) then Remark : This is famously cites as the Lindeberg Lévy CLT. Historically, however, the proof arose in di erent camps sometime between 90-930 (covering Lindeberg, Lévy, Chebyshev, Markov and Lyapunov). Remark : Notice by construction := p ¹ is a standardized sample mean because [ ] ¹ = by identical distributedness and [ ] ¹ = by independence and identical distributedness. Thus := p ¹ ¹ = p = ¹ [ ] ¹ [ ] ¹ Therefore p ¹ has mean 0 and variance : " # p ¹ p = ¹ = 0 " # p ¹ = ¹ = = Thus, even as! the random variable» (0). Although this is a long way from proving has a de nable distribution, even in the limit, it does help to point out that the term p! is necessary to stabilize, ¹ for otherwise we simply have ¹! 0. Remark 3: Asymptotically := p ¹ has a standard normal density () expf g. De ne := ( ), hence := p ¹ X = p We will show the characteristic function [ ]!. The latter is the characteristic function of a standard normal, while characteristic functions and distributions have a unique correspondence: only standard normals have a characteristic function like. =

By independence and identical distributednessnow expand around = 0 by a second order Taylor expansion: " i h = h i # Y = = () = Y = = i ³ h i h = = +! +! + = +!! + where is a remainder term that is a function of. Now take the expectations as in (), and note [ ] = [( )] = 0 and [ ] = [( ) ] = = : i h = + [ ]!! + [ ] = + where := [ ] It is easy to prove is a bounded random variable, in particular j j wp (see Bierens) so even if does not have higher moments we know j j. Further! 0 because [ ]!. Now take the -power in (): by the Binomial expansion ³ i h = = µ µ The rst term satis es µ + + = X = X µ µ µ =0 µ! because the sequence f( + ) g converges: ( )! (simply put = ). For the second term notice for large enough we have j j hence X µ µ X µ X µ = ( + ) = = =0

See Bierens for details that verify ( + )! 0. QED. Example (Bernoulli): The most striking way to demonstrate the CLT is to begin with the least normal of data, a Bernoulli random variable which is discrete and takes only two nite values, and show p ¹! (0 ), a continuous random variable with in nite support. We simulate» () for = 5, 50 500, 0000 and compute ¹ ¹ ¹ := p = p p = p 8 In order to show the small sample distribution of we need a sample of, 0 so we repeat the simulation 000 times. We plot the relative frequencies of the sample of 0 for each. Let f g 000 = be the simulated sample of 0. The relative frequencies are the percentage 000 P 000 = ( + ) for interval endpoints = [ 5 49 48 49 50]. See Figure 7. For the sake of comparison in Figure 8 we plot the relative frequencies for one sample of 000 iid standard normal random variables» (0 ). Another way to see how becomes a standard normal random variable is to compute the quantile such that ( ) = 975. A standard normal satis es ( 96) = 975. We call an empirical quantile since it is based on a simulated set of samples. We simulate 0,000 samples for each size = 5 05 05..., 5005 and compute. See Figure 9. As increases! 96. Figure 7 Standardized Means for Bernoulli 4 000 0, = 5 000 0, = 50 000 0, = 500 000 0, = 5000 3

Figure 8 Standard Normal Standard Normal Figure 9 - Empirical Quantiles q.3...0.9.8.7 5 505 005 505 005 505 3005 3505 4005 4505 5005 Sample Size n 4