Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1
Elements of Probability 2
Elements of Probability CDF&PDF Recall for a given random variable (r.v.) X, the cumulative distribution function (CDF) is dened: F (x) = P {X x}, x R N (1) The probability density function (PDF): Discrete r.v. p(x) = P {X = x} (2) Continuous r.v. F (a) = a f(x)dx (3) W. Mnif 3
Elements of Probability CDF&PDF Example: Normal distribution with mean µ and variance σ 2 : N (µ, σ) f(x) = 1 σ 2π exp( (x µ)2 /(2σ 2 )) (4) Figure 1: PDF of the standard normal r.v. (µ = 0, σ = 1) W. Mnif 4
Elements of Probability CDF&PDF Example 2: Chi-square distribution with k degrees of freedom The distribution of a sum of the squares of k independent standard normal random variables Pdf: f(x; k) = 1 2 k/2 Γ(k/2) xk/2 1 e x/2, if x > 0 and 0 elsewhere, (5) where Γ is the Gamma function dened as: Γ(x) = 0 t x 1 e t dt. (6) Exercise: Find out the characteristics of the Gamma function. W. Mnif 5
Example 3: t-distribution Dened as the distribution of Elements of Probability CDF&PDF Z V/υ, (7) where Z N (0, 1), V χ 2 υ, and Z and V are independent. Pdf f(x) = Γ(υ+1 2 ) πυγ( υ 2 )(1 + x2 υ ) (υ+1)/2 (8) W. Mnif 6
k th moment about the origin Elements of Probability Moments m k = E[X k ] (9) k th central moment µ k = E[(X E[X]) k ] = E[(X m 1 ) k ] (10) Example: X N (µ, σ 2 ) m 1 = µ, µ 1 = 0, µ 2 = σ 2 W. Mnif 7
Elements of Probability Moments Theorem. If E[ X k ] < for some positive integer k, then E[ X j ] <, j = 1, 2,..., k 1 In other words: the existence of the k th moment implies the existence of all moments of lower order W. Mnif 8
Elements of Probability Moments Generating Functions Denition. Let X be a r.v. with pdf f. The moment generating function (MGF) of X, denoted by M X (t), is M X (t) = E[e tx ] provided E[e tx ] for all values t in an interval ( ɛ, ɛ), ɛ > 0. Explicitly, Continuous case: M X (t) = etx f(x)dx Discrete case: M X (t) = x etx f(x) W. Mnif 9
Elements of Probability Moments Generating Functions Theorem. If X has MGF M X (t), then E[X n ] = M (n) X (0) = dn dt nm X(t) t=0 (11) Example: Suppose X N (µ, σ 2 ), then M X (t) = e tµ+t2 σ 2 /2 (12) Proof. Exercise. W. Mnif 10
Elements of Probability Moments Generating Functions Theorem. Let X and Y be r.v.s that have the same MGF. Then, X and Y have the same distribution. Theorem. Assume X is a r.v. with MGF M X (t). Then, the MGF of Y = ax + b is for any value t such that M X (at) exists. M Y (t) = e bt M X (at) (13) W. Mnif 11
Elements of Probability Markov's Inequality Proposition. If X is a nonnegative r.v., then ɛ > 0 P {X ɛ} E[X] ɛ (14) Proof. Exercise! (Hint: Dene the r.v. Y s.t. Y = ɛ, if X ɛ and 0, elsewhere.) Corollary. (Chebyshev's Inequality) if X is a r.v. with mean µ and variance σ 2, then k > 0, P { X µ kσ} 1 k2. (15) Proof. Exercise! (Hint: Use Markov's Inequality ) W. Mnif 12
Elements of Probability Weak Law of Large Numbers Theorem. Let X 1,X 2,... be a sequence of independent and identically distributed r.v. with E[X i ] = µ and V ar(x i ) = σ 2 <. Then, ɛ > 0, P ( X 1 +... + X n n µ > ɛ) 0 as n (16) Proof. Exercise! (Hint: Use Chebyshev's Inequality) Remark: The Weak Law of Large Numbers holds for V ar(x i ) = W. Mnif 13
Elements of Probability Strong Law of Large Numbers Theorem 1. Let X 1,X 2,..., be a sequence of independent and identically distributed r.v. with E[X i ] = µ and V ar(x i ) = σ 2 <. Then, ɛ > 0, P ( lim n X 1 +... + X n n µ < ɛ) = 1 (17) W. Mnif 14
Elements of Probability Strong Law of Large Numbers Figure 2: Testing Law Large number for X i N (4, 2)) W. Mnif 15
Elements of Probability Central Limit Theorem Theorem. Let {X i } be an i.i.d vector in R d, with E[X i ] = µ and V ar[x i ] = Σ <, then 1 n (X i µ) n i=1 D n N (0, Σ) (18) Comment: The CLT is applicable to any arbitrary probability distribution. Proof. See Kallenberg(1997). W. Mnif 16
Elements of Probability Continuous Mapping Theorem Theorem. Continuous mapping 1: If X n P a and a function f continuous at a, then g(x n ) P n g(a) n Theorem. Continuous mapping 2: If X n D X and f(.) is a continuous function, then g(x n ) D n g(x) n W. Mnif 17
Elements of Probability Slutsky's Theorem Theorem. Slutsky's Theorem Suppose that X n D Y, where Y is a constant. Then, n X and Y n P n 1. X n + Y n D n X + Y 2. X n Y n D n XY 3. Y 1 D n X n n Y 1 X, provided Y is invertible. W. Mnif 18
Linear regression W. Mnif 19
Linear regression Least Squares tting Construct a straight line with equation y = b 0 + b 1 x, "best ts" the data (x 1, y 1 ),...,(x n, y n ) Denote the i th tted (or predicted) value by ŷ i = b 0 + b 1 x i, i = 1,..., n and the i th residual by e i = y i ŷ i = y i (b 0 + b 1 x i ) W. Mnif 20
Linear regression Least Squares tting Dene SSE = 1 i n e 2 i SSE is a measure of good tness to the data. When SSE = 0, all points y i lie on the line b 0 + b 1 x i. The method of least squares: ( β 0, β 1 ) = arg min (b 0,b 1 ) R 2 SSE (19) W. Mnif 21
Linear regression Least Squares tting Proposition. Dene n i=1 x = x n i i=1, ȳ = y i, S xx = n n n S yy = (y i ȳ) 2, S xy = i=1 i=1 n (x i x) 2, i=1 n (x i x)(y i ȳ) The line that best ts the data using the least squares criterion is: y = β 0 + β 1 x, W. Mnif 22
where β 1 = S xy S xx, and β 0 = ȳ β 1 x Proof. Exercise Hint: Dene r = S xy Sxx S yy as the sample covariance coecient and write the SSE as a sum of three squares. Proof that SSE = S yy (1 r 2 ) + (b 1 Sxx r S yy ) 2 + n(ȳ b 0 b 1 x) 2 Remarks: Under the least squares criterion, SSE = S yy (1 r 2 ) SSE is a measure of unexplained variability (error could not be explained by the model) W. Mnif 23
r 2 is called the coecient of determination, and is denoted by R 2 β 1 = r S yy Sxx The sample correlation and the estimated slope have the same sign Exercice Show that n i=1 (y i ȳ) 2 = n i=1 (ŷ i ȳ) 2 + n i=1 (y i ŷ) 2 and interpret the meaning of each term. W. Mnif 24
Linear regression Least Squares tting Dene: Total sum of squares SST = n i=1 (y i ȳ) 2 measure the total variability of the original observations. Regression Sum of squares SSR = n i=1 (ŷ i ȳ) 2 measure the total variability explained by the regression model. From previous slide: SST = SSR + SSE 1 = SSR SST + SSE SST SSR SST SSE SST : portion of the total variability that is explained by the model : the portion of unexplained variability. W. Mnif 25
Linear regression Least Squares tting Recall SSE = SST (1 R 2 ) R 2 = SSR SST, so if R 2 1 Most of the variability is explained if R 2 0 Inecient regression model Some properties of the residuals n i=1 e i = 0, n i=1 e i x i = 0, 1 n n i=1 ŷ i = ȳ. Exercise: Check the previous properties. W. Mnif 26
Linear regression Linear regression model Linear relationship between the response and explanatory variables + an error term The model: y = β 0 + β 1 x + ɛ where ɛ, called the error term or disturbance, includes all the other factors. Let denote by y i the response to the x i, i = 1,..., n. Assume that {(y i, x i )} n i=1 is i.i.d from a distribution y i = β 0 + β 1 x i + ɛ i where ɛ i are i.i.d and ɛ 1 x i N (0, σ 2 ). β 0 (resp. β 1 ) is called the intercept (resp. slope). W. Mnif 27
Linear regression Linear regression model Afterwards, we suppose that E(y 2 i ) <, E(x2 i ) < and E(x2 i ) 0 Theorem. follows: Under linear model assumptions, the coecients are given as β 1 = cov(x i, y i ) V ar(x i ) β 0 = E(y i ) β 1 E(x i ) Proof. By the LIE, E(x i ɛ i ) = 0 E(x i (y i β 0 β 1 x i )) = 0 On the other hand, E(y i ) = β 0 + β 1 E(x i ). After solving the two equations with two unknowns β 0 and β 1, the result is shown. W. Mnif 28
Linear regression Linear regression model Lemma. Assume that {(y i, x i )} n i=1 then is i.i.d. sample from population, ˆβ 1 = S xy S xx ˆβ 0 = y ˆβ 1 x are consistent estimators respectively to β 1 and β 0. Furthermore, the conditional distribution of ˆβ1 (resp. ˆβ0 ) is normal with mean β 1 (resp. β 0 ) and variance σ2 S xx (resp. σ 2 ( 1 n + x2 S xx ) ) W. Mnif 29
Linear regression Linear regression model Proof. Since {(y i, x i )} n i=1 is i.i.d, then {(x iy i )} n i=1 and {(x2 i )}n i=1 are also i.i.d. From the WLLN, S xy n P n cov(x i, y i ). The same for S xx P n V ar(x i). Since x 1 n x is continuous for any neighborhood of x 0 0, we apply the Continuous Mapping Theorem to get n S xx P n 1 V ar(x i ). Therefore, S xy S xx P n cov(x i,y i ) V ar(x i ) = β 1. As y and x are consistent respectively to E(y i ) and E(x i ), then we obtain the consistency for ˆβ 0 to β 0. W. Mnif 30
It can be shown that ˆβ 1 = ˆβ 0 = n i=1 n i=1 x i x y i S xx ( 1 n x(x i x S xx ))y i As y i x i N (β 0 + β 1 x i, σ 2 ) and the sum of a linear combination of i.i.d normally distributed random variables is normally distributed, we obtain the proof of the rest of the lemma. Remark: Since E( ˆβ 0 ) = β 0 and E( ˆβ 1 ) = β 1, so the estimators ˆβ 0 and ˆβ 1 are unbiased. An unknown variable should be estimated too. What is it????? W. Mnif 31
Linear regression Linear regression model The answer is σ 2. Recall that σ 2 is the variance of the dispersion the model. Small σ 2 (x i, y i ) lie close to the true regression line Large σ 2 the model is weak to explain the observed values (x i, y i ) Lemma. The statistic s 2 = n i=1 (Y i Ŷi) 2 n 2 (20) is an unbiased estimator of σ 2. Furthermore, (n 2)s 2 σ 2 χ 2 n 2, (21) W. Mnif 32
where χ 2 n 2 is the chi-square distribution with n 2 degrees of freedom. Moreover, s 2 is independent of both ˆβ 0 and ˆβ 1. Proof. The proof is basically straightforward if we show β 0 + β 1 x j, j, is normally distributed. β 0 + β 1 x j = n i=1 ( 1 n + (x j x) (x i x) S xx )y i Which is a linear combination of mutually independent, normally distributed random variables y i. Given E(y i x i ) = β 0 + β 1 x i and V (y i ) = σ 2, we get β 0 + β 1 x j is normally distributed with mean, variance and W. Mnif 33
estimated standard error resp.: E( β 0 + β 1 x j x) = β 0 + β 1 x j V ( β 0 + β 1 x j x) = σ 2 [ 1 n + (x j x) 2 ] S xx s( β 0 + β 1 1 x j x) = s n + (x j x) 2 S xx On the other hand, recall that the residuals satisfy: n i=1 n i=1 n e i = (Y i Ŷi) = 0 e i x i = i=1 n (Y i Ŷi)x i = 0 i=1 Therefore we can eliminate two summands from the sum of squares W. Mnif 34
n i=1 (y i ŷ i ) 2. Thus there are only n 2 independent y i ŷ i. We need to show that s 2 is unbiased estimator, which is equivalent to show that E( n i=1 (y i ŷ i ) 2 ) = (n 2)σ 2 We can easily proof that SSE = S yy β 2 1S xx. By taking expectations of both sides, we get E(SSE x) = E(S yy x) E( β 2 1 x)s xx or E( β 2 1 x) = σ2 S xx + β 2 1 ( Use V (X) = E(X2 ) (E(X)) 2 ) E(y 2 i ) = σ 2 + (β 0 + β 1 x i ) 2, and E(y 2 ) = σ2 n + (β 0 + β 1 x) 2 W. Mnif 35
So, E(S yy ) = (n 1)σ 2 + β 2 1S xx E(SSE) = (n 2)σ 2. Conclusion: s 2 is unbiased estimator of σ 2. W. Mnif 36
Linear regression Condence intervals for the Regression Coecients Denition. (Two sided condence intervals) An interval of the form [a, b], where a b,is said to be a 100(1 α)% condence interval for the parameter θ if P (a θ b) 1 α Exercise: Suppose that θ N (µ, σ 2 ). Find 95% condence interval for θ. W. Mnif 37
Linear regression Condence intervals for the Regression Coecients Denition. (One sided condence intervals) An interval of the form [a, ) is a 100(1 α)% lower one sided condence interval for the parameter θ if P (a θ) 1 α Similarly, an interval of the form (, b] is upper one sided condence interval for the parameter θ if P (θ b) 1 α Exercise: Suppose that θ N (µ, σ 2 ). Find 95% lower and upper one sided condence interval for θ. W. Mnif 38
Linear regression Condence intervals for the Regression Coecients For β 1 : Recall: β 1 β 1 σ/ S xx N (0, 1), (n 2)s 2 σ 2 χ 2 n 2 So, β 1 β 1 s/ S xx t n 2 s( β 1 ) = s S xx : the estimated standard error of the estimate. 100(1 α)% condence interval for β 1 β1 ± tn 2(α/2)s( β1) W. Mnif 39
Linear regression Condence intervals for the Regression Coecients For β 0 : The estimated standard error of β 0 is : s( β 0 ) = s 1 n + x2 S xx As we did for β 1, the sampling distribution of β 0 is: β 0 β 0 s( β 0 ) t n 2 100(1 α)% condence interval for β 0 β 0 ± t n 2 (α/2)s( β 0 ) W. Mnif 40
Linear regression Test of Hypotheses Suppose we want to test the hypothesis H 0 : β 1 = 0 against H 1 : β 1 0. Comparable to answer to the question: Does a variation of x have an eect on the response variable y? Under hypothesis H 0, β 1 = 0 β 1 s/ S xx t n 2 Then, a two sided level α test of H 0 is reject H 0 if β 1 s/ S xx t n 2 (α/2) W. Mnif 41
Linear regression CI for the mean response E(y i x i ) Derive point and interval estimates for E(y i x i ) we need to study the sampling distribution of the estimator β 0 + β 1 x i Theorem. The point estimator β 0 + β 1 x i is unbiased estimator of the mean response E(y i x i ). Furthermore, It is normally distributed with mean, variance, and estimated standard error given by: E( β 0 + β 1 x i ) = β 0 + β 1 x i, V ( β 0 + β 1 x i ) = σ 2 ( 1 n s( β 0 + β 1 1 x i ) = s n + (x x)2 S xx ), + (x x)2 S xx. W. Mnif 42
Proof. Exercise. Consequently, β 0 + β 1 x i (β 0 + β 1 x i ) s( β 0 + β 1 x i ) t n 2 100(1 α)% CI for the mean response: β 0 + β 1 x i ± t n 2 (α/2)s( β 0 + β 1 x i ) W. Mnif 43
Linear regression Prediction Interval Theorem. The estimated standard error s(y i ŷ i ) is given by: s(y i ŷ i ) = s 1 + 1 n + (x x)2 S xx and y i ŷ i s(y i ŷ i ) t n 2 Proof. Use y i = β 0 + β 1 x i + ɛ where ɛ x i N (0, σ 2 ) W. Mnif 44
ŷ i = β 0 + β 1 x i ŷ i is independent of ɛ V (y i ŷ i ) = V (y i ) + V (ŷ i ) 100(1 α)% prediction interval β 0 + β 1 x i ± t n 2 (α/2)s(y i ŷ i ) W. Mnif 45
Linear regression Least squares linear predictor Dene y i R: explained variable or regressand; x i R d : explanatory variables or regressors The idea is to nd a "good" predictor of y i by a linear combination of x i, i = 1,..., n. "good"=pick up a linear combination of x i which minimizes the expected squared error. In other words, β = arg min b R d E[(y i X ib) 2 ] (22) W. Mnif 46
Linear regression Least squares linear predictor Theorem. Under the assumptions 1. E[y 2 i ] < 2. E[x i x i ] is a non-singular d d matrix 3. E[x i x i] < Then, β = (E[x i x i ]) 1 E(x i y i ) Proof. We have, b, E[(y i X i b)2 ] = E(y 2 i ) 2E(y ix i )b + b E(x i x i )b W. Mnif 47
First order condition, E[(y i X i b)2 ] b = 0 Recall if Σ is a d d matrix and b and X R d, then b X X = C X ΣX X = 2ΣX 2E(x i y i ) + 2E(x i x i)b = 0 Therefore, β = (E[x i x i ]) 1 E(x i y i ) Note that if we choose for example x i1 = 1, we work within an ane W. Mnif 48
framework. Moreover, if we choose d = 2, then we obtain the same results as in slide (28). W. Mnif 49
Linear regression Least squares linear predictor Theorem. Assume {(y i, x i )} n i=1 Under model assumptions, is i.i.d sample from a given population. β = [ n x i x i] 1 n x i y i (23) i=1 i=1 is a consistent estimator to β. Proof. We have from Cauchy Schwartz inequality E(x i y i ) E( x i y i ) (E(x i x i)) 1/2 E(y 2 i ) 1 2 < As {(y i, x i )} n i=1 is i.i.d sample, then {(y ix i )} n i=1 and {(x ix i )}n i=1 are W. Mnif 50
i.i.d. From WLLN, 1 n 1 n n i=1 n i=1 x i y i x i x i P n E(x iy i ) P n E(x ix i) Since g(a) = A 1 is continuous for any invertible A. continuous mapping theorem, By using the [ 1 n n i=1 x i x i] 1 P n [E(x ix i)] 1 W. Mnif 51
So, [ n i=1 x i x i] 1 n i=1 x i y i P n [E(x ix i)] 1 E(x i y i ) We can proof easily that the estimator dened before is the same as the OLS estimator. Recall the OLS estimator is dened as arg min b R d e e, where e = Y Xb and y 1 Y =. R n 1, X = y n x 1. x n R n d W. Mnif 52
Using the rst order condition, we get β = (X X) 1 X Y Now we can proof the OLS estimator is the same as (23). It is easy to proof it, just we use the following if A = (a 1,..., a n ) R m n and b 1 B =. R n p then, AB = n b i=1 a ib i n W. Mnif 53
Linear regression OLS Denition 2. Dene a regression model by y i = x i β + ɛ i such as E(ɛ i x i ) = 0. If V (y i x i ) is constant, then the model is homoskedastic. If it depends on x i, it is called heteroskedastic model. In fact, V (y i x i ) = V (x iβ + ɛ i x i ) = V (ɛ i x i ) = E(ɛ 2 i x i ) (E(E(ɛ i x i ))) 2 = E(ɛ 2 i x i ) = σ 2 (x i ) W. Mnif 54
Linear regression Finite-Sample Properties: Homoskedastic Case The model: {y i, x i }n i=1 i.i.d sample from y i = x iβ + ɛ i E(ɛ i x i ) = 0 V (ɛ i x i ) = E(ɛ 2 i x i ) = σ 2 Let study the OLS estimator proprieties when we applied to this model. We need to study its: bias variance consistency W. Mnif 55
asymptotic distribution W. Mnif 56
Linear regression OLS Bias Recall β = (X X) 1 X Y So the bias is: β β = (X X) 1 X (Xβ + ɛ) β = β + (X X) 1 X ɛ β = (X X) 1 X ɛ So, E( β β X) = (X X) 1 X E(ɛ X) = 0 W. Mnif 57
The OLS is conditionally unbiased. We can deduce that E( β β) = 0. Therefore, the OLS is unconditionally unbiased. W. Mnif 58
Linear regression OLS Variance V ( β X) = V ( β β X) = V ((X X) 1 X ɛ X) = (X X) 1 X V (ɛ)x(x X) 1 We used the following propriety: V (AX) = AV (X)A. On the other hand, E(ɛ 2 i X) = E(ɛ 2 i x i ) = σ 2 W. Mnif 59
For i j, E(ɛ i ɛ j X) = E(ɛ i E(ɛ j X, ɛ i ) X) = E(ɛ i E(ɛ j X) X) = E(ɛ i X)E(ɛ j X) = 0 So, E(ɛɛ X) = σ 2 I n and therefore V ( β X) = σ 2 (X X) 1. Recall that for ane model, we are estimating 2 coecients, and we found that the statistic s 2 = e e n 2 is unbiased consistent estimator for σ 2. We can, using the same methodology, show that s 2 = e e n d W. Mnif 60
is also unbiased consistent estimator for σ 2 for a dimension d model. W. Mnif 61
Linear regression Consistency of OLS Suppose that {(y i, x i )}n i=1 is i.i.d from y i = x i β + ɛ i and E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ i x i ) = 0, E(ɛ 2 i x i) = σ 2 E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d Then β = (X X) 1 X Y W. Mnif 62
is a consistent estimator to β. Proof. Then β = (X X) 1 X Y = [ 1 n x i x n i] 11 n = [ 1 n i=1 n x i x i] 11 n i=1 = β + [ 1 n n x i y i i=1 n x i (x iβ + ɛ i ) i=1 n x i x i] 11 n i=1 n x i ɛ i i=1 so β P β [ 1 n n n i=1 x ix i ] 1 1 n n i=1 x P iɛ i 1 n n 0 n i=1 x iɛ i P n 0 W. Mnif 63
[ 1 n We have E(x i x i ) <, by CMT, n i=1 x ix i ] 1 (E(x ix i )) 1 And, P n E(x ij ɛ i ) E( x ij ɛ i ) (E(x 2 ij)) 1 2 (E(ɛ 2 i )) 1 2 (E(x ix i )) 1 2 σ 2 < So from WLLN, 1 n Using the CMT β n i=1 x iɛ i P n β P n E(x iɛ i ) = 0, because E(ɛ i x i ) = 0. W. Mnif 64
Linear regression Asymptotic Propriety of OLS Under the previous assumption, we have n( β β) N (0, σ 2 [E(x i x i)] 1 ) Proof. We have {y i, x i } n i=1 is i.i.d, (x iɛ i ) n i=1 E(x i ɛ i ) = 0. Also, i.i.d. Furthermore, E(ɛ 2 i x ij x im ) E( ɛ 2 i x ij x im ) (E(ɛ 4 i )) 1 2 ((E(x 4 ij )) 2 1 (E(x 4 im )) 1 1 2 ) 2 < then V (x i ɛ i ) < and is well dened. Furthermore, V (x i ɛ i ) = E(ɛ 2 i x i x i) = σ 2 E(x i x i) W. Mnif 65
By the CLT, 1 n (x i ɛ i E(x i ɛ))) = 1 n x i ɛ i n n i=1 D n i=1 N (0, σ2 E(x i x i)) By applying the Slutsky's theorem, we obtain the result. Lemma. Under model assumptions, a R d, na ( β β) s 2 a ( X X n ) 1 a N (0, 1). Proof. As s 2 = e e n d σ 2 [E(x i x i )] 1 P n σ2, Then from the CMT, s 2 ( X X n ) 1 P n W. Mnif 66
From CMT, na ( β β) And D n N (0, σ2 a [E(x i x i )] 1 a) 1 s 2 a ( X X n ) 1 a P n 1. σ 2 a [E(x i x i )] 1 a The result is proofed by using the Slutsky's theorem. W. Mnif 67
Linear regression Null Hypothesis test Suppose H 0 : a β = c 0 is true, then n(a β c0 ) s 2 a ( X X n ) 1 a N (0, 1). Test with signicance level α: We reject H 0 : a β = c 0 and accept H 1 : a β c 0 if n(a β c0 ) > q, s 2 a ( X X n ) 1 a where q is dened such P ( Z > q) = α, Z N (0, 1) W. Mnif 68
Condence Interval set We cannot reject H 0 : β = c 0,if a β q s2 a (X X) 1 a c 0 a β + q s2 a (X X) 1 a Exercise: Consider c 0 = 0 and a = (1, 0,..., 0). Interpret the result. W. Mnif 69
Linear regression Heteroscedastic Model The model: {y i, x i }n i=1 i.i.d sample from y i = x i β + ɛ i E(ɛ i x i ) = 0 V (ɛɛ x) = E(ɛ 2 i x i) = diag(σ 2 (x 1 ),..., σ 2 (x i ),..., σ 2 (x n )) = Ω E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d W. Mnif 70
We can proof (Exercise) V ( β X) = (X X) 1 X ΩX(X X) 1 E( β X) = E( β) = β D n( β β) N (0, [E(x ix i)] 1 E(σ 2 (x i )x i x i)[e(x i x i)] 1 ) n Exercise: Derive an estimator to [E(x i x i )] 1 E(σ 2 (x i )x i x i )[E(x ix i )] 1 and a condence interval for a coecient β j. W. Mnif 71
Linear regression Generalized Least Squares estimators The model: {y i, x i }n i=1 i.i.d sample from y i = x i β + ɛ i E(ɛ i x i ) = 0 V (ɛɛ x) = E(ɛ 2 i x i) = Ω > 0 E(y 2 i ) <, E(x i x i) < E(x i x i ) is non singular E(ɛ 4 i ) <, E(x4 ij ) <, j = 1,.., d W. Mnif 72
The GLS dened by β = (X Ω 1 X) 1 X Ω 1 Y is the best linear unbiased and consistent estimator. Proof. Decompose Ω 1 = C C Dene variables Ỹ = CY, X = CX, and ɛ = Cɛ Use the following lemma: Lemma 3. (Gauss-Markov) The OLS estimator is the Best Linear Unbiased Estimator (BLUE) under the linear model with E(ɛ i x i ) = 0 and E(ɛ i x i ) = 0 W. Mnif 73
References [1] Kallenberg, O., "Foundations of Modern Probability", New York, Springer-Verlag, 1997. [2] Rosenkrantz, W. A.,"Probability and Statistics for Science, Engineering, and Finance", Chapman & Hall, 2009. [3] Wooldridge, J., "Econometric Analysis of Cross Section and Panel Data", MIT Press, 2002. W. Mnif 74