Asymptotic Results for the Linear Regression Model

Similar documents
ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

MA Advanced Econometrics: Properties of Least Squares Estimators

Solution to Chapter 2 Analytical Exercises

Lecture 33: Bootstrap

Efficient GMM LECTURE 12 GMM II

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

11 THE GMM ESTIMATION

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 19: Convergence

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

1 General linear Model Continued..

Statistical Inference Based on Extremum Estimators

x iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

LECTURE 8: ASYMPTOTICS I

Properties and Hypothesis Testing

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

TAMS24: Notations and Formulas

1 Convergence in Probability and the Weak Law of Large Numbers

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Statistical Properties of OLS estimators

ECE 330:541, Stochastic Signals and Systems Lecture Notes on Limit Theorems from Probability Fall 2002

Topic 9: Sampling Distributions of Estimators

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Matrix Representation of Data in Experiment

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Economics 102C: Advanced Topics in Econometrics 4 - Asymptotics & Large Sample Properties of OLS

2.2. Central limit theorem.

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Topic 9: Sampling Distributions of Estimators

Parameter, Statistic and Random Samples

Distribution of Random Samples & Limit theorems

Lecture 20: Multivariate convergence and the Central Limit Theorem

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Introductory statistics

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Common Large/Small Sample Tests 1/55

Advanced Stochastic Processes.

Lecture 8: Convergence of transformations and law of large numbers

Probability and Statistics

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Lecture 24: Variable selection in linear models

Regression with an Evaporating Logarithmic Trend

Topic 9: Sampling Distributions of Estimators

Lecture 7: Properties of Random Samples

Chapter 6 Sampling Distributions

Last time: Moments of the Poisson distribution from its generating function. Example: Using telescope to measure intensity of an object

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Questions and Answers on Maximum Likelihood

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Notes 19 : Martingale CLT

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

An Introduction to Asymptotic Theory

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Mathematical Statistics - MS

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

LECTURE 11 LINEAR PROCESSES III: ASYMPTOTIC RESULTS

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

POLS, GLS, FGLS, GMM. Outline of Linear Systems of Equations. Common Coefficients, Panel Data Model. Preliminaries

Lecture 11 October 27

32 estimating the cumulative distribution function

Statistics 20: Final Exam Solutions Summer Session 2007

Berry-Esseen bounds for self-normalized martingales

Classical Linear Regression Model. Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Logit regression Logit regression

STA Object Data Analysis - A List of Projects. January 18, 2018

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

Understanding Samples

This section is optional.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

7.1 Convergence of sequences of random variables

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

Statistical Theory MT 2008 Problems 1: Solution sketches

Sequences and Series of Functions

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Chapter 2 The Monte Carlo Method

The standard deviation of the mean

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

Algebra of Least Squares

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

STAT431 Review. X = n. n )

Statistical Theory MT 2009 Problems 1: Solution sketches

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Stat 421-SP2012 Interval Estimation Section

1 Inferential Methods for Correlation and Regression Analysis

Transcription:

Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is of dimesio ( k), ε is a (ukow) ( 1) vector of disturbaces, ad β is a (ukow) (k 1) parametervector. Weassumethat À k, ad that ρ(x) =k. This implies that ρ(x 0 X)=k as well. Throughout we assume that the classical coditioal momet assumptios apply, amely E(ε i X) =0 i. V (ε i X) =σ 2 i. We Þrst show that the probability limit of the OLS estimator is β, i.e., that it is cosistet. I particular, we kow that ˆβ = β +(X 0 X) 1 X 0 ε E(ˆβ X) = β +(X 0 X) 1 X 0 E(ε X) = β I terms of the (coditioal) variace of the estimator ˆβ, V (ˆβ X) =σ 2 (X 0 X) 1.

Now we will rely heavily o the followig assumptio X lim X 0 = Q, where Q is a Þite, osigular k k matrix. The we ca write the covariace of ˆβ i a sample of size explicitly as so that V (ˆβ X )= σ2 µ X 0 1 X, µ lim V (ˆβ X ) = lim σ2 X 0 lim X = 0 Q 1 =0 Sice the asymptotic variace of the estimator is 0 ad the distributio is cetered o β for all, we have show that ˆβ is cosistet. Alteratively, we ca prove cosistecy as follows. We eed the followig result. Lemma 1.1. µ X 0 ε plim =0. Proof. First, ote that E X 0 ε =0for ay. The the variace of the expressio X 0 ε is give by µ µ µ X 0 ε X 0 ε X 0 0 ε V = E = 2 E(X 0 εε 0 X) = σ2 X 0 X, so that lim V X 0 ε =0 Q =0. Sice the asymptotic mea of the radom variable is 0 ad the asymptotic variace is 0, the probability limit of the expressio is 0. 1 2

Now we ca state a slightly more direct proof of cosistecy of the OLS estimator, which is plim(ˆβ) = plim(β +(X 0 X) 1 X 0 ε) µ X 0 1 µ = β + lim X X 0 ε plim = β + Q 1 0=β. Next, cosider whether or ot s 2 is a cosistet estimator of σ 2. Now s 2 = SSE k, where SSE =(y X ˆβ) 0 (y X ˆβ). We showed that E(s 2 )=σ 2 for all -thatis, that s 2 is a ubiased estimator of σ 2 for all sample sizes. Sice SSE = ε 0 Mε, with M =(I X(X 0 X) 1 X 0 ), the Now so that p lim s 2 = p lim ε0 Mε k = p lim ε0 Mε µ = p lim ε0 ε ε 0 p lim X = p lim ε0 ε 0 Q 1 0. E µ ε 0 ε ε 0 ε = 1 X = 1 E µ X 0 X ε 2 i = 1 X X ε 2 i Eε 2 i = 1 (σ 2 )=σ 2. 1 µ X 0 ε 3

Similarly,, uder the assumptio that ε i is i.i.d., the variace of the radom variable beig cosidered is give by V µ ε 0 ε = 2 V ( X ε 2 i ) X = 2 V (ε 2 i ) = 2 ([E(ε 4 i ) V (ε i ) 2 ]) = 1 [E(ε 4 i ) V (ε i ) 2 ], so that the limit of the variace of ε0 ε is 0 as log as E(ε4 i ) is Þite [we have already assumed that the Þrst two momets of the distributio of ε i exist]. Thus the asymptotic distributio of ε0 ε is cetered at σ2 ad is degeerate, thus provig cosistecy of s 2. 2. Testig without Normally Distributed Disturbaces I this sectio we look at the distributio of test statistics associated with liear restrictios o the β vector whe ε i is ot assumed to be ormally distributed as N(0, σ 2 ) for all i. Istead, we will proceed with the weaker coditio that ε i is idepedetly ad idetically distributed with the commo cumulative distributio fuctio (c.d.f.) F. Furthermore, E(ε i )=0ad V (ε i )=σ 2 for all i. Sice we retai the mea idepedece ad homogeeity assumptios, ad sice ubiasedess, cosistecy, ad the Gauss-Markov theorem for that matter, all oly rely o these Þrst two coditioal momet assumptios, all these results cotiue to hold whe we drop ormality. However, the small sample distributios of our test statistics o loger will be accurate, sice these were all derived uder the assumptio of ormality. If we made other explicit assumptios regardig F, it is possible i priciple to derive the small sample distributios of test statistics, though these distributios are ot simple to characterize aalytically or eve to compute. Istead of makig explicit assumptios regardig the form of F, we ca derive distributios of test statistics which are valid for large o matter what the exact form of F [except that it must be a member of the class of distibutios for which the asymptotic results are valid, of course]. We begi with the followig useful lemma, which is associated with Lidberg- Levy. 4

Lemma 2.1. If ε is i.i.d. with E(ε i )=0ad E(ε 2 i )=σ 2 for all i; if the elemets of the matrix X are uiformly bouded so that X ij <U for all i ad j ad for U Þite; ad if lim X0 X = Q is Þite ad osigular, the 1 X 0 ε N(0, σ 2 Q). Proof. Cosider the case of oly oe regressor for simplicity. The Z 1 is a scalar. Let G i be the c.d.f. of X i ε i. Let X X i ε i S 2 X V (X i ε i )=σ 2 X Xi 2. I this scalar case, Q = lim 1 P i X2 i. By the Lidberg-Feller Theorem, the ecessary ad sufficiet coditio for Z N(0, σ 2 Q) is lim 1 S 2 X Z for all ν > 0. Now G i (ω) =F ( ω X i ). The rewrite [2.1] as lim S 2 X X 2 i Z ω >νs ω 2 dg i (ω) =0 (2.1) ω/x i >νs / X i µ ω Sice lim S 2 =limσ P 2 Xi 2 = σ2 Q, the lim S 2 ad ozero scalar. The we eed to show X lim 1 Xi 2 δ i, =0, X i 2 df ( ω X i )=0. =(σ2 Q) 1, which is a Þite where δ i, R ³ 2 ω ω/x i >νs / X i X i df ( ω X i ). Now lim δ i, =0for all i ad ay Þxed ν sice X i is bouded while lim S = [thus the measure of the set ω/x i > νs / X i goes to 0 asymptotically]. Sice lim P 1 Xi 2 is Þite ad lim δ i, =0for all i, lim P 1 Xi 2δ i, =0. 5

For vector-valued X i, the result is idetical of course, with Q beig k k istead of a scalar. The proof is oly slightly more ivolved. Now we ca prove the followig importat result. Theorem 2.2. Uder the coditios of the lemma, (ˆβ β) N(0, σ 2 Q 1 ). Proof. (ˆβ β) = X 0 X 1 1 X 0 ε. Sice lim X 0 X 1 = Q 1 ad 1 X 0 ε N(0, σ 2 Q),the (ˆβ β) N(0, σ 2 Q 1 QQ 1 )=N(0, σ 2 Q 1 ). The results of this proof have the followig practical implicatios. For small, the distributio of (ˆβ β) is ot ormal, though asymptotically the distributio of this radom variable coverges to a ormal. The variace of this radom variable coverges to σ 2 Q 1 which is arbitrarily well-approximated by ³ s 2 0 1 X X = s 2 (XX 0 ) 1. But the variace of (ˆβ β) is equal to the variace of (ˆβ β) divided by, so that i large samples the variace of the OLS estimator is approximately equal to s 2 (XX 0 ) 1 / = s 2 (XX 0 ) 1, eve whe F is o-ormal. Usual t tests of oe liear restrictio o β are o loger cosistet. However, a aalagous large sample test is readily available. Propositio 2.3. Let ε i be i.i.d. (0,σ 2 ), σ 2 <, ad let Q be Þite ad osigular. Cosider the test H 0 : Rβ = r, where R is (1 k) ad r is a scalar, both kow. The Rˆβ r p s2 R(X 0 X) 1 R 0 N(0, 1). Proof. Uder the ull, Rˆβ r = Rˆβ Rβ = R(ˆβ β), so that the test statistic is R(ˆβ β) p s2 R(X 0 X/) 1 R 0. Sice (ˆβ β) N(0, σ 2 Q 1 ) R(ˆβ β) N(0, σ 2 RQ 1 R 0 ). 6

The deomiator of the test statistic has a probability limit equal to p σ 2 RQ 1 R 0, which is the stadard deviatio of the radom variable i the umerator. A mea zero ormal radom variable divided by its stadard deviatio has the distributio N(0, 1). A similar result holds for the situatio i which multiple (oredudet) liear restrictios o β are tested simultaeously. Propositio 2.4. Let ε i be i.i.d. (0,σ 2 ), σ 2 <, ad let Q be Þite ad osigular. Cosider the test H 0 : Rβ = r, where R is (m k) ad r is a (m 1) vector, both kow. The (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ)/m SSE/( k) χ2 m m. Proof. The deomiator is a cosistet estimator of σ 2 [as would be SSS/], ad has a degeerate limitig distributio. Uder the ull hypothesis, r Rˆβ = R(X 0 X) 1 X 0 ε, so that the umerator of the test statistic ca be writte ε 0 Dε, where D X(X 0 X) 1 R 0 [R(X 0 X) 1 R 0 ] 1 R(X 0 X) 1 X 0. Now D is symmetric ad idempotet with ρ(d) =m. The write ε 0 Dε = ε0 PP 0 DPP 0 ε mσ 2 mσ 2 = 1 m V 0 Im 0 0 0 = 1 mx Vi 2, m where P is the orthogoal matrix such that P 0 Im 0 DP = ad where V = 0 0 P 0 ε. Thus the V σ i are i.i.d. with mea 0 ad stadard deviatio 1. Because V = P 0 ε/σ, X P ji ε j V i =,,..., m. σ j=1 7 V

The terms i the summad are idepedet radom variables with mea 0 ad variace σ 2 j = P 2 ji. Sice the ε j are i.i.d., the cetral limit theorem applies, so that X j=1 P ji ε j /σ W N(0, 1), q P q P where W = j=1 σ2 j = j=1 P ji 2 =1because P is orthogoal. The sice P each V i is stadard ormal, 1 m m V i 2 χ2 m m. The practical use of this theorem is as follows. For large samples, the sample distributio of the statistic (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ)/m SSE/( k) χ2 m m, (2.2) which meas that for large eough (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ) SSE/( k) χ 2 m. (2.3) Now whe disturbaces were ormally distributed, i a sample of size we have the same test statistic give by the left-had side of [2.2] was distributed as a F (m, k). Note that lim F (x; m, k) is χ2 m (x). For example, say that the m test statistic associated with a ull with (m) 3 restrictios assumed the value 4. Iasamplesizeof = 8000, we have (approximately) 1 F (4; 3, 8000) =.00741. The asymptotic approximatio give i [2.3] i this example yields 1 χ 2 3(3 4) =.00738. Ismallsamples, differeces are much greater of course. For example, for the same value of the test statistic, whe =20we have 1 F (4; 3, 20 3) =.02523, which is certaily differet tha 1 χ 2 3(3 4) =.00738. I summary, whe the sample size is very large, the ormality assumptio is pretty much icosequetial i the testig of liear restrictios o the parameter vector β. I small samples, some give assumptio as to the form of F (ε) is geerally required to compute the distributio of the estimator ˆβ. Uder ormality, the small sample distributios of test statistics follow the t or F, depedig o the umber of restrictios beig tested. Testig i this eviromet depeds critically o the ormality assumptio, ad if the disturbaces are ot ormally distributed, tests will be biased i geeral. 8