Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is of dimesio ( k), ε is a (ukow) ( 1) vector of disturbaces, ad β is a (ukow) (k 1) parametervector. Weassumethat À k, ad that ρ(x) =k. This implies that ρ(x 0 X)=k as well. Throughout we assume that the classical coditioal momet assumptios apply, amely E(ε i X) =0 i. V (ε i X) =σ 2 i. We Þrst show that the probability limit of the OLS estimator is β, i.e., that it is cosistet. I particular, we kow that ˆβ = β +(X 0 X) 1 X 0 ε E(ˆβ X) = β +(X 0 X) 1 X 0 E(ε X) = β I terms of the (coditioal) variace of the estimator ˆβ, V (ˆβ X) =σ 2 (X 0 X) 1.
Now we will rely heavily o the followig assumptio X lim X 0 = Q, where Q is a Þite, osigular k k matrix. The we ca write the covariace of ˆβ i a sample of size explicitly as so that V (ˆβ X )= σ2 µ X 0 1 X, µ lim V (ˆβ X ) = lim σ2 X 0 lim X = 0 Q 1 =0 Sice the asymptotic variace of the estimator is 0 ad the distributio is cetered o β for all, we have show that ˆβ is cosistet. Alteratively, we ca prove cosistecy as follows. We eed the followig result. Lemma 1.1. µ X 0 ε plim =0. Proof. First, ote that E X 0 ε =0for ay. The the variace of the expressio X 0 ε is give by µ µ µ X 0 ε X 0 ε X 0 0 ε V = E = 2 E(X 0 εε 0 X) = σ2 X 0 X, so that lim V X 0 ε =0 Q =0. Sice the asymptotic mea of the radom variable is 0 ad the asymptotic variace is 0, the probability limit of the expressio is 0. 1 2
Now we ca state a slightly more direct proof of cosistecy of the OLS estimator, which is plim(ˆβ) = plim(β +(X 0 X) 1 X 0 ε) µ X 0 1 µ = β + lim X X 0 ε plim = β + Q 1 0=β. Next, cosider whether or ot s 2 is a cosistet estimator of σ 2. Now s 2 = SSE k, where SSE =(y X ˆβ) 0 (y X ˆβ). We showed that E(s 2 )=σ 2 for all -thatis, that s 2 is a ubiased estimator of σ 2 for all sample sizes. Sice SSE = ε 0 Mε, with M =(I X(X 0 X) 1 X 0 ), the Now so that p lim s 2 = p lim ε0 Mε k = p lim ε0 Mε µ = p lim ε0 ε ε 0 p lim X = p lim ε0 ε 0 Q 1 0. E µ ε 0 ε ε 0 ε = 1 X = 1 E µ X 0 X ε 2 i = 1 X X ε 2 i Eε 2 i = 1 (σ 2 )=σ 2. 1 µ X 0 ε 3
Similarly,, uder the assumptio that ε i is i.i.d., the variace of the radom variable beig cosidered is give by V µ ε 0 ε = 2 V ( X ε 2 i ) X = 2 V (ε 2 i ) = 2 ([E(ε 4 i ) V (ε i ) 2 ]) = 1 [E(ε 4 i ) V (ε i ) 2 ], so that the limit of the variace of ε0 ε is 0 as log as E(ε4 i ) is Þite [we have already assumed that the Þrst two momets of the distributio of ε i exist]. Thus the asymptotic distributio of ε0 ε is cetered at σ2 ad is degeerate, thus provig cosistecy of s 2. 2. Testig without Normally Distributed Disturbaces I this sectio we look at the distributio of test statistics associated with liear restrictios o the β vector whe ε i is ot assumed to be ormally distributed as N(0, σ 2 ) for all i. Istead, we will proceed with the weaker coditio that ε i is idepedetly ad idetically distributed with the commo cumulative distributio fuctio (c.d.f.) F. Furthermore, E(ε i )=0ad V (ε i )=σ 2 for all i. Sice we retai the mea idepedece ad homogeeity assumptios, ad sice ubiasedess, cosistecy, ad the Gauss-Markov theorem for that matter, all oly rely o these Þrst two coditioal momet assumptios, all these results cotiue to hold whe we drop ormality. However, the small sample distributios of our test statistics o loger will be accurate, sice these were all derived uder the assumptio of ormality. If we made other explicit assumptios regardig F, it is possible i priciple to derive the small sample distributios of test statistics, though these distributios are ot simple to characterize aalytically or eve to compute. Istead of makig explicit assumptios regardig the form of F, we ca derive distributios of test statistics which are valid for large o matter what the exact form of F [except that it must be a member of the class of distibutios for which the asymptotic results are valid, of course]. We begi with the followig useful lemma, which is associated with Lidberg- Levy. 4
Lemma 2.1. If ε is i.i.d. with E(ε i )=0ad E(ε 2 i )=σ 2 for all i; if the elemets of the matrix X are uiformly bouded so that X ij <U for all i ad j ad for U Þite; ad if lim X0 X = Q is Þite ad osigular, the 1 X 0 ε N(0, σ 2 Q). Proof. Cosider the case of oly oe regressor for simplicity. The Z 1 is a scalar. Let G i be the c.d.f. of X i ε i. Let X X i ε i S 2 X V (X i ε i )=σ 2 X Xi 2. I this scalar case, Q = lim 1 P i X2 i. By the Lidberg-Feller Theorem, the ecessary ad sufficiet coditio for Z N(0, σ 2 Q) is lim 1 S 2 X Z for all ν > 0. Now G i (ω) =F ( ω X i ). The rewrite [2.1] as lim S 2 X X 2 i Z ω >νs ω 2 dg i (ω) =0 (2.1) ω/x i >νs / X i µ ω Sice lim S 2 =limσ P 2 Xi 2 = σ2 Q, the lim S 2 ad ozero scalar. The we eed to show X lim 1 Xi 2 δ i, =0, X i 2 df ( ω X i )=0. =(σ2 Q) 1, which is a Þite where δ i, R ³ 2 ω ω/x i >νs / X i X i df ( ω X i ). Now lim δ i, =0for all i ad ay Þxed ν sice X i is bouded while lim S = [thus the measure of the set ω/x i > νs / X i goes to 0 asymptotically]. Sice lim P 1 Xi 2 is Þite ad lim δ i, =0for all i, lim P 1 Xi 2δ i, =0. 5
For vector-valued X i, the result is idetical of course, with Q beig k k istead of a scalar. The proof is oly slightly more ivolved. Now we ca prove the followig importat result. Theorem 2.2. Uder the coditios of the lemma, (ˆβ β) N(0, σ 2 Q 1 ). Proof. (ˆβ β) = X 0 X 1 1 X 0 ε. Sice lim X 0 X 1 = Q 1 ad 1 X 0 ε N(0, σ 2 Q),the (ˆβ β) N(0, σ 2 Q 1 QQ 1 )=N(0, σ 2 Q 1 ). The results of this proof have the followig practical implicatios. For small, the distributio of (ˆβ β) is ot ormal, though asymptotically the distributio of this radom variable coverges to a ormal. The variace of this radom variable coverges to σ 2 Q 1 which is arbitrarily well-approximated by ³ s 2 0 1 X X = s 2 (XX 0 ) 1. But the variace of (ˆβ β) is equal to the variace of (ˆβ β) divided by, so that i large samples the variace of the OLS estimator is approximately equal to s 2 (XX 0 ) 1 / = s 2 (XX 0 ) 1, eve whe F is o-ormal. Usual t tests of oe liear restrictio o β are o loger cosistet. However, a aalagous large sample test is readily available. Propositio 2.3. Let ε i be i.i.d. (0,σ 2 ), σ 2 <, ad let Q be Þite ad osigular. Cosider the test H 0 : Rβ = r, where R is (1 k) ad r is a scalar, both kow. The Rˆβ r p s2 R(X 0 X) 1 R 0 N(0, 1). Proof. Uder the ull, Rˆβ r = Rˆβ Rβ = R(ˆβ β), so that the test statistic is R(ˆβ β) p s2 R(X 0 X/) 1 R 0. Sice (ˆβ β) N(0, σ 2 Q 1 ) R(ˆβ β) N(0, σ 2 RQ 1 R 0 ). 6
The deomiator of the test statistic has a probability limit equal to p σ 2 RQ 1 R 0, which is the stadard deviatio of the radom variable i the umerator. A mea zero ormal radom variable divided by its stadard deviatio has the distributio N(0, 1). A similar result holds for the situatio i which multiple (oredudet) liear restrictios o β are tested simultaeously. Propositio 2.4. Let ε i be i.i.d. (0,σ 2 ), σ 2 <, ad let Q be Þite ad osigular. Cosider the test H 0 : Rβ = r, where R is (m k) ad r is a (m 1) vector, both kow. The (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ)/m SSE/( k) χ2 m m. Proof. The deomiator is a cosistet estimator of σ 2 [as would be SSS/], ad has a degeerate limitig distributio. Uder the ull hypothesis, r Rˆβ = R(X 0 X) 1 X 0 ε, so that the umerator of the test statistic ca be writte ε 0 Dε, where D X(X 0 X) 1 R 0 [R(X 0 X) 1 R 0 ] 1 R(X 0 X) 1 X 0. Now D is symmetric ad idempotet with ρ(d) =m. The write ε 0 Dε = ε0 PP 0 DPP 0 ε mσ 2 mσ 2 = 1 m V 0 Im 0 0 0 = 1 mx Vi 2, m where P is the orthogoal matrix such that P 0 Im 0 DP = ad where V = 0 0 P 0 ε. Thus the V σ i are i.i.d. with mea 0 ad stadard deviatio 1. Because V = P 0 ε/σ, X P ji ε j V i =,,..., m. σ j=1 7 V
The terms i the summad are idepedet radom variables with mea 0 ad variace σ 2 j = P 2 ji. Sice the ε j are i.i.d., the cetral limit theorem applies, so that X j=1 P ji ε j /σ W N(0, 1), q P q P where W = j=1 σ2 j = j=1 P ji 2 =1because P is orthogoal. The sice P each V i is stadard ormal, 1 m m V i 2 χ2 m m. The practical use of this theorem is as follows. For large samples, the sample distributio of the statistic (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ)/m SSE/( k) χ2 m m, (2.2) which meas that for large eough (r Rˆβ) 0 [R(X 0 X) 1 R 0 ] 1 (r Rˆβ) SSE/( k) χ 2 m. (2.3) Now whe disturbaces were ormally distributed, i a sample of size we have the same test statistic give by the left-had side of [2.2] was distributed as a F (m, k). Note that lim F (x; m, k) is χ2 m (x). For example, say that the m test statistic associated with a ull with (m) 3 restrictios assumed the value 4. Iasamplesizeof = 8000, we have (approximately) 1 F (4; 3, 8000) =.00741. The asymptotic approximatio give i [2.3] i this example yields 1 χ 2 3(3 4) =.00738. Ismallsamples, differeces are much greater of course. For example, for the same value of the test statistic, whe =20we have 1 F (4; 3, 20 3) =.02523, which is certaily differet tha 1 χ 2 3(3 4) =.00738. I summary, whe the sample size is very large, the ormality assumptio is pretty much icosequetial i the testig of liear restrictios o the parameter vector β. I small samples, some give assumptio as to the form of F (ε) is geerally required to compute the distributio of the estimator ˆβ. Uder ormality, the small sample distributios of test statistics follow the t or F, depedig o the umber of restrictios beig tested. Testig i this eviromet depeds critically o the ormality assumptio, ad if the disturbaces are ot ormally distributed, tests will be biased i geeral. 8