Economics 583: Econometric Theory I A Primer on Asymptotics

Similar documents
A Primer on Asymptotics

1 Appendix A: Matrix Algebra

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Regression #3: Properties of OLS Estimator

ECON 3150/4150, Spring term Lecture 6

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Asymptotics for Nonlinear GMM

simple if it completely specifies the density of x

Estimation, Inference, and Hypothesis Testing

Asymptotic Theory. L. Magee revised January 21, 2013

Econ 583 Homework 7 Suggested Solutions: Wald, LM and LR based on GMM and MLE

Econ 582 Fixed Effects Estimation of Panel Data

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Quick Review on Linear Multiple Regression

The outline for Unit 3

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Economics 583: Econometric Theory I A Primer on Asymptotics: Hypothesis Testing

Introduction to Maximum Likelihood Estimation

Economics 241B Review of Limit Theorems for Sequences of Random Variables

Graduate Econometrics I: Asymptotic Theory

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Properties of the least squares estimates

Econ 583 Final Exam Fall 2008

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Economics 582 Random Effects Estimation

Lecture 1: Review of Basic Asymptotic Theory

A Very Brief Summary of Statistical Inference, and Examples

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

IEOR E4703: Monte-Carlo Simulation

MS&E 226: Small Data

Notes on Asymptotic Theory: Convergence in Probability and Distribution Introduction to Econometric Theory Econ. 770

Financial Econometrics and Quantitative Risk Managenent Return Properties

Nonlinear GMM. Eric Zivot. Winter, 2013

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Econometrics II - EXAM Answer each question in separate sheets in three hours

Introduction to Estimation Methods for Time Series models Lecture 2

Chapter 6: Large Random Samples Sections

Lecture 3 Stationary Processes and the Ergodic LLN (Reference Section 2.2, Hayashi)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Inference For High Dimensional M-estimates: Fixed Design Results

Multiple Equation GMM with Common Coefficients: Panel Data

Limiting Distributions

Testing Linear Restrictions: cont.

Asymptotic Statistics-III. Changliang Zou

Financial Econometrics

Large Sample Properties & Simulation

Lecture 21: Convergence of transformations and generating a random variable

Limiting Distributions

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Long-Run Covariability

Introduction to Machine Learning CMU-10701

Econometrics I. September, Part I. Department of Economics Stanford University

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

8. Hypothesis Testing

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Single Equation Linear GMM with Serially Correlated Moment Conditions

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

Econ 424 Time Series Concepts

Advanced Econometrics

The regression model with one stochastic regressor.

δ -method and M-estimation

Regression and Statistical Inference

MS&E 226: Small Data

Regression #4: Properties of OLS Estimator (Part 2)

Problem Set 2

Cross-fitting and fast remainder rates for semiparametric estimation

Financial Econometrics and Volatility Models Copulas

Nonparametric Econometrics

Statistical Properties of Numerical Derivatives

Stochastic Convergence, Delta Method & Moment Estimators

BTRY 4090: Spring 2009 Theory of Statistics

Inference for High Dimensional Robust Regression

Lecture 4: Heteroskedasticity

Lecture 1: August 28

Discrete Dependent Variable Models

The Moment Method; Convex Duality; and Large/Medium/Small Deviations

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Multivariate Analysis and Likelihood Inference

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Econometrics I, Estimation

A Practical Test for Strict Exogeneity in Linear Panel Data Models with Fixed Effects

STAT 830 Non-parametric Inference Basics

5601 Notes: The Sandwich Estimator

Lecture 2: Review of Probability

Lecture 7 Asymptotics of OLS

STAT 461/561- Assignments, Year 2015

Multivariate Statistics

Monte Carlo Simulations and the PcNaive Software

Econometrics Master in Business and Quantitative Methods

STAT 518 Intro Student Presentation

Inference For High Dimensional M-estimates. Fixed Design Results

Chapter 11. Regression with a Binary Dependent Variable

Transcription:

Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013

The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency: as we get more and more data, we eventually know the truth asymptotic normality: as we get more and more data, averages of random variables behave like normally distributed random variables

Motivating Example: Let 1 denote an independent and identically distributed (iid ) random sample with [ ]= and var( )= 2 We don t know the probability density function (pdf) ( θ) but we know the value of 2 Goal: Estimate the mean value from the random sample of data, compute 95% confidence interval for and perform 5% test of 0 : = 0 vs. 1 : 6= 0

Remarks 1. A natural question is: how large does have to be in order for the asymptotic distribution to be accurate? 2. The CLT is a first order asymptotic approximation. Sometimes higher order approximations are possible. 3. We can use Monte Carlo simulation experiments to evaluate the asymptotic approximations for particular cases. 4. We can often use bootstrap techniques to provide numerical estimates for avar(ˆ ) and confidence intervals. These are alternatives to the analytic formulas derived from asymptotic theory.

5. If we don t know 2 we have to estimate avar(ˆ ) This gives the estimated asymptotic variance avar(ˆ ) d =ˆ 2 where ˆ 2 is a consistent estimate of 2

ProbabilityTheoryTools The probability theory tools (theorems) for establishing consistency of estimators are Laws of Large Numbers (LLNs). The tools (theorems) for establishing asymptotic normality are Central Limit Theorems (CLTs). A comprehensive reference is White (1994), Asymptotic Theory for Econometricians, Academic Press.

Laws of Large Numbers Let 1 be a iid random variables with pdf ( θ) Foragivenfunction define the sequence of random variables based on the sample 1 = ( 1 ) 2 = ( 1 2 ). = ( 1 ) Example: Let ( 2 ) so that θ =( 2 ) and define = = 1 Hence, sample statistics may be treated as a sequence of random variables. X =1

Definition 1 Convergence in Probability Let 1 be a sequence of random variables. We say that converges in probability to which may be a constant or a random variable, and write or if 0, lim = lim Pr( )=0

Remarks 1. isthesameas 0 2.Foravectorprocess,Y =( 1 ) 0 Y c if for =

Definition 2 Consistent Estimator If ˆ is an estimator of the scalar parameter then ˆ is consistent for if ˆ If ˆθ is an estimator of the 1 vector θ then ˆθ is consistent for θ if ˆ for =1 Remark: All consistency proofs are based on a particular LLN. A LLN is a result that states the conditions under which a sample average of random variables converges to a population expectation. There are many LLN results. The most straightforward is the LLN due to Chebychev.

Chebychev s LLN Theorem 1 Chebychev s LLN Let 1 be iid random variables with [ ]= and var( )= 2 Then = 1 X =1 [ ]= The proof is based on the famous Chebychev s inequality.

Lemma 2 Chebychev s Inequality Let by any random variable with [ ] = and var( ) = 2 Then for every 0 Pr( ) var( ) 2 = 2 2 Proof of Chebychev s LLN Applying Chebychev s inequality to = 1 P =1 gives so that Pr( ) var( ) 2 lim Pr( ) lim = 2 2 2 2 =0

Remark The proof of Chebychev s LLN relies on the concept of convergence in mean square. That is, MSE( )= [( ) 2 ]= 2 0 as In general, if MSE( ) 0 then However, it may be the case that but MSE( ) 9 0 This would occur, for example, if var( ) does not exist.

Recall, MSE( )= [( ) 2 ]=bias( ) 2 +var( ) So convergence in MSE implies that as bias( ) 2 = ³ [ ] 2 0 var( ) = ³ [ ] 2 0

Kolmogorov s and Khinchine s LLN The LLN with the weakest set of conditions on the random sample 1 is due to Kolmogorov and Khinchine. Theorem 3 Kolmongorov s LLN Let 1 be iid random variables with [ ] and [ ]= Then = 1 X =1 [ ]=

Remark Kolmogorov s LLN does not require var( ) to exist. Only the mean needs to exist. That is, this LLN covers random variables with fat-tailed distributions (e.g., Student s with 2 degrees of freedom).

Markov s LLN If we relax the iid assumption then we can still get convergence of the sample mean. However, further assumptions are required on the sequence of random variables 1 For example, a LLN for non iid random variables that is particularly useful is due to Markov. Theorem 4 Markov s LLN Let 1 be a sample of uncorrelated random variables with finite means [ ]= and uniformly bounded variances var( )= 2 for =1 Then = 1 X =1 1 X =1 = 1 X =1 ( ) 0

Remarks: 1. The proof follows directly from Chebychev s inequality. 2. Sometimes the uniformly bounded variance assumption, var( )= 2 for =1 is stated as sup 2 where sup denotes the supremum or least upper bound. 3. Notice that when the iid assumption is relaxed, stronger restrictions need to be placed on the variances of the random variables. This is a general principle with LLNs. If some assumptions are weakened then other assumptions must be strengthened. Think of this as an instance of the no free lunch principle applied to probability theory.

LLNs for Serially Correlated Random Variables If we go further and relax the uncorrelated assumption, then we can still get a LLN result. However, we must control the dependence among the random variables. In particular, if =cov( ) exists for all and are close to zero for large; e.g., if then it can be shown that 0 1 and Pr( ) 2 0 as Further details on LLNs for serially correlated random variables will bediscussedinthesectionontimeseriesconcepts.

Theorem 5 Slutsky s Theorem 1 Let { } and { } be a sequences of random variables and let and be constants. 1. If then 2. If and then + + 3. If and then provided 6= 0; 4. If and ( ) is a continuous function at then ( ) ( )

Example 1 Convergence of the sample variance and standard deviation Let 1 be iid random variables with [ 1 ] = and var( 1 ) = 2 Then the sample variance, given by ˆ 2 = 1 X =1 ( ) 2 is a consistent estimator for 2 ; i.e. ˆ 2 2

Remarks: 1. Notice that in order to prove the consistency of the sample variance we used the fact that the sample mean is consistent for the population mean. If was not consistent for then ˆ 2 would not be consistent for 2 The consisitency of ˆ 2 implies that the finite sample bias of ˆ 2 0 as 2. We may write 1 X =1( ) 2 = 1 = 1 X =1 X =1 ( ) 2 ( ) 2 ( ) 2 + (1)

Here, = ( ) 2 0 is a sequence of random variables that converge in probability to zero. In this, case we often use the notation = (1) This short-hand notation often simplifies the exposition of certain derivations involving probability limits. 3. Given that ˆ 2 2 and the square root function is continuous it follows from Slutsky s Theorem that the sample standard deviation, ˆ is consistent for

Convergence in Distribution and the Central Limit Theorem Let 1 be a sequence of random variables. For example, let 1 be an µ iid sample with [ ] = and var( ) = 2 and define = We say that converges in distribution to a random variable and write if ( ) =Pr( ) ( ) =Pr( ) as for every continuity point of the cumulative distribution function (CDF) of

Remarks 1. In most applications, is either a normal or chi-square distributed random variable. 2. Convergence in distribution is usually established through Central Limit Theorems (CLTs). 3. If is large, we can use the convergence in distribution results to justify using the distribution of as an approximating distribution for.that is, for largeenoughweusetheapproximation for any set R Pr( ) Pr( )

4. Let Y =( 1 ) 0 be a multivariate sequence of random variables. Then if and only if for any λ R Y W λ 0 Y λ 0 W

Relationship between Convergence in Distribution and Convergence in Probability Result 1: If Y converges in probability to the random vector Y then Y converges in distribution to the probability distribution of Y Result 2: If Y converges in distribution to a non-random vector c, i.e., if the probability distribution of Y converges to the point mass distribution at c then Y converges in probability to c

Central Limit Theorems The most famous CLT is due to Lindeberg and Levy. Theorem 6 Lindeberg-Levy CLT (Greene, 2003 p. 909) Let 1 be an iid sample with [ ]= and var( )= 2 Then = Ã! = 1 X µ (0 1) as =1 That is, for all R Pr( ) Φ( ) as

where Z Φ( ) = ( ) ( ) = 1 exp µ 1 2 2 2

Remarks 1. Theµ CLT suggests that we may approximate the distribution of = = 1 P ³ =1 by a standard normal distribution. This, in turn, suggests approximating the distribution of the sample average by a normal distribution with mean and variance 2 2. A common short-hand notation is (0 1)

which implies (by Slutsky s theorem) that à 2 avar( ) = 2! 3. A consistent estimate of avar( ) is d avar( ) =ˆ 2 s.t. ˆ 2 2

Theorem 7 Multivariate Lindeberg-Levy CLT (Greene, 2003 p. 912) Let X 1 X be dimensional iid random vectors with [X ]=μ and var(x )= [(X μ)(x μ) 0 ]=Σ where Σ is a nonsingular matrix. Let Σ = Σ 1 2 Σ 1 20 Then Σ 1 2 ( X μ) Z (0 I ) Equivalently, we may write ( X μ) (0 Σ) X = 1 X X (μ Σ) =1 X (μ 1 Σ) This result implies that avar( X) = 1 Σ

Remark If Σ is unknown and if ˆΣ Σ then we can consistently estimate avar( X) using avar( d X) = 1 ˆΣ For example,we could estimate Σ using the sample covariance ˆΣ = 1 X =1 (X X)(X X) 0

The Lindeberg-Levy CLT is restricted to iid random variables, which limits its usefulness. In particular, it is not applicable to the least squares estimator in the linear regression model with fixed regressors. To see this, consider the simple linear model with a single fixed regressor = + where is fixed and is iid (0 2 ) The least squares estimator is ˆ = 1 X X 2 =1 =1 1 X X 2 =1 =1 = +

Re-arranging and multiplying both sides by gives (ˆ ) = 1 1 X 2 1 =1 The CLT needs to be applied to the random variable = X =1 However, even though is iid, is not iid since var( )= 2 2 and, thus, varies with Hence, the Lindeberg-Levy CLT is not appropriate.

The following Lindeberg-Feller CLT is applicable for the linear regression model with fixed regressors. Theorem 8 Lindeberg-Feller CLT (Greene, 2003 p. 901) Let 1 be independent (but not necessarily identically distributed) random variables with [ ]= and var( )= 2 Define = 1 P =1 and 2 = 1 P =1 2 Suppose Then lim max 2 2 =0 lim 2 = 2! Ã (0 1) ³ (0 2 )

A CLT result that is equivalent to the Lindeberg-Feller CLT but with conditions that are easier to understand and verify is due to Liapounov. Theorem 9 Liapounov s CLT (Greene, 2003 p. 912) Let 1 be independent (but not necessarily identically distributed) random variables with [ ]= and var( )= 2 Suppose further that [ 2+ ] for some 0 If 2 = 1 P =1 2 is positive and finite for all sufficiently large, then Ã! (0 1)

Equivalently, ³ (0 2 ) lim 2 = 2

Remark There is a multivariate version of the Lindeberg-Feller CLT (See Greene, 2003 p. 913) that can be used to prove that the OLS estimator in the multiple regression model with fixed regressors converges to a normal random variable. For our purposes, we will use a different multivariate CLT that is applicable for random regressors in a cross section or time series context. Details will be giveninthesectionoftimeseriesconcepts.

Asymptotic Normality A consistent estimator ˆθ is asymptotically normally distributed (asymptotically normal) if (ˆθ θ) (0 Σ) Equivalently, ˆθ is asymptotically normal if ˆθ (θ 1 Σ) avar(ˆθ) = 1 Σ Remark If ˆΣ Σ then ˆθ (θ 1 ˆΣ) avar(ˆθ) d = 1 ˆΣ This result is justified by an extension of Slutsky s theorem.

Theorem 10 Slutsky s Theorem 2 (Extension of Slutsky s Theorem to convergence in distribution) Let and be sequences of random variables such that where is a random variable and is a constant. Then the following results hold: 1. 2. provided 6= 0 3. + +

Remark Suppose and are sequences of random variables such that where and are (possibly correlated) random variables. necessarily true that Then it is not + + We have to worry about the dependence between and.

Theorem 11 Continuous Mapping Theorem (CMT) Suppose ( ) :R R is continuous everywhere and Then as ( ) ( ) as Remark This is one of the most useful results in statistics. In particular, we will use it to deduce the asymptotic distribution of test statistics (e.g. Wald, LM, LR, etc.)

The Delta Method Suppose we have an asymptotically normal estimator ˆ for the scalar parameter ; i.e., (ˆ ) (0 2 ) Often we are interested in some function of say = ( ) Suppose ( ) : R R is continuous and differentiable at and that 0 = is continuous Then the delta method result is (ˆ ) = ( (ˆ ) ( )) (0 0 ( ) 2 2 ) Equivalently, (ˆ ) Ã ( ) 0 ( ) 2 2!

Remark Theasymptoticvarianceofˆ = (ˆ ) is avar(ˆ ) =avar ³ (ˆ ) = 0 ( ) 2 2 which depends on and 2 Typically, and/or 2 are unknown and so avar ³ (ˆ ) must be estimated. A consistent estimate of avar ³ (ˆ ) has the form where avar(ˆ ) d = avar ³ d (ˆ ) ˆ ˆ 2 2 = 0 (ˆ ) 2ˆ 2

Proof of delta method result. The delta method gets its name from the use of a first order Taylor series expansion. Consider an exact first order Taylor series expansion of (ˆ ) at ˆ = (i.e., apply the mean value theorem) (ˆ ) = ( )+ 0 ( )(ˆ ) = ˆ +(1 ) 0 1 Multiplying both sides by and re-arranging gives ( (ˆ ) ( )) = 0 ( ) (ˆ ) Since is between ˆ and and since ˆ we have that Further since 0 is continuous, by Slutsky s Theorem 0 ( ) 0 ( ) It follows from the convergence in distribution results that ( (ˆ ) ( )) 0 ( ) (0 2 ) (0 0 ( ) 2 2 )

Now suppose θ R and we have an asymptotically normal estimator (ˆθ θ) (0 Σ) ˆθ (θ 1 Σ) Let η = g(θ) :R R ; i.e., η ( 1) = g(θ) = 1 (θ) 2 (θ). (θ) denote the parameter of interest where η R and

Assume that g(θ) is continuous with continuous first derivatives. Define the Jacobian matrix g(θ) θ 0 ( ) = 1 (θ) 1 2 (θ) 1 (θ) 1 1 (θ) 2 2 (θ) 1 (θ) 2 (θ) 2...... (θ) 2 (θ)

Then (ˆη η) = ( (ˆθ) (θ)) 0 Ã! g(θ) θ 0 Σ Ã! 0 g(θ) θ 0 Remark If ˆΣ Σ then a practically useful result is (ˆθ) (θ) 1 avar( (ˆθ)) d = 1 g(ˆθ) θ 0 ˆΣ g(ˆθ) θ 0 g(ˆθ) θ 0 ˆΣ g(ˆθ) θ 0 0 0

Example 2 Estimation of Generalized Learning Curve Consider the generalized learning curve (see Berndt, 1992, chapter 3) where = 1 (1 ) exp( ) = real unit cost at time = cumulative production up to time = production in time (0 2 ) = learning curve parameter = returns to scale parameter

Intuition: Learning is proxied by cumulative production. If the learning curve effect is present, then as cumulative production (learning) increases real unit costs should fall. If production technology exhibits constant returns to scale, then real unit costs should not vary with the level of production. If returns to scale are increasing, then real unit costs should decline as the level of production increases.

The generalized learning curve may be converted to a linear regression model by taking logs: µ µ 1 ln = ln 1 + ln + ln + = 0 + 1 ln + 2 ln + where = x 0 β + 0 = ln 1 1 = 2 = (1 ) x = (1 ln ln ) 0

The learning curve parameters may be recovered using = = 1 = 1 (β) 1+ 2 1 = 2 (β) 1+ 2

Least squares gives consistent and asymptotically normal estimates ˆβ = (X 0 X) 1 X 0 y β ˆ 2 = 1 X ( x 0 ˆβ) 2 2 =1 ˆβ (β ˆ 2 (X 0 X) 1 ) Then from Slutsky s Theorem provided 2 6= 1 ˆ = ˆ = ˆ 1 1 = 1+ˆ 2 1+ 2 1 1 = 1+ˆ 2 1+ 2

We can use the delta method to get the asymptotic distribution of ˆ = (ˆ ˆ ) 0 : Ã! Ã! ˆ 1 (ˆβ) = ˆ 2 (ˆβ) g(β) g(ˆβ) β 0 ˆ 2 (X 0 X) 1 g(ˆβ) β 0 0 where g(β) β 0 = = 1 (β) 1 (β) 1 2 2 (β) 2 (β) 1 2 0 1 1 1+ 2 (1+ 2 ) 2 0 0 1 (1+ 2 ) 2 1 (β) 3 2 (β) 2

Remark Asymptotic standard errors for ˆ and ˆ aregivenbythesquarerootofthe diagonal elements of g(ˆβ) β 0 ˆ 2 (X 0 X) 1 g(ˆβ) 0 β 0 where g(ˆβ) β 0 = 0 1 ˆ 1 1+ˆ 2 (1+ˆ 2 ) 2 1 (1+ˆ 2 ) 2 0 0