The Statistical Property of Ordinary Least Squares

Size: px
Start display at page:

Download "The Statistical Property of Ordinary Least Squares"

Transcription

1 The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting the linear model into y, we get [ 1 ˆβ = X X] T X T [Xβ + u] [ 1 = β + X X] T X T u

2 Thus, statistical property of the OLS is very much the statistical property of [ X T X] 1 X T u Definition: The OLS estimator is said to be unbiased if [ E ˆβ] = β This means, given the way the data is generated, before you start doing the estimation procedure, you would expect that on average, the OLS estimate should not deviate from the true value. Because [ 1 ˆβ β = X X] T X T u, then OLS is unbiased if ( [ ) 1 E X X] T X T u = 0

3 Now, let us use the Law of Iterated Expectation. That is, [ ( ) [ [ 1 ( ) 1 E X T X X u] T = E E X T X X u X]] T [ ( ) 1 = E X T X X T E [u X]] Therefore, the OLS estimator is unbiased if E [u X] = 0 In this case, [ E ˆβ X] = β as well. This assumption states that the regressor X is exogenous to the error term. But this could be a strong assumption. In time series, it means that mean of current u t conditional on all present and future values of X s has to be zero. Roughly, it means that current u t has to be orthogonal (or uncorrelated) with X s of all periods.

4 In some cases, we can make the the assumption weaker. That is, E [u t X t ] = 0 If the data is times series, i.e. the sample is the data over time, and t means time periods, then the assumption is said to be that the regressors (X t ) are predetermined with respect to the error term.

5 An Example where OLS estimator is biased: y t = β 1 + β 2 y t 1 + u t, u t IID(0, σ 2 ) Then, using the FWL theorem, we first demean y t, y t 1 to derive the OLS coefficient. That is, [ ] 1 [ ] 1 ˆβ 2 = y 1M T ı y 1 y T 1 M ı y = y 1M T ı y 1 y T 1 M ı (y 1 β 2 + u) [ ] 1 = β 2 + y 1M T ı y 1 y T 1 M ı u Notice that β 2 is a scalar. So, β 2 y 1 = y 1 β 2. Because E [u M ı y 1 ] = E [u y 1 y 1 ı] 0 in general, the OLS estimator is not unbiased.

6 Consistency of the Least Squares Estimator In this Chapter, we will argue that as the sample size increases, the least squares estimator ˆβ will converge in probability to the truel value β. Law of Large Numbers Before moving to show the consistency of the OLS estimator, we discuss a very important theorem, the Law of Large Numbers. Let u t, t = 1,..., n be any random variable that is independently and identically distributed with finite mean E[u t ] = µ u and finite variance σ 2 u. Then, the sample average is ū = 1 n n t=1 u t

7 which has mean E [ū] = E [ 1 n ] n u t = 1 n t=1 n E[u t ] = 1 n t=1 Next, we derive variance. Now, for n = 2 Var(ū) = E n µ = µ t=1 [ ] (u 1 + u 2 ) µ = 1 4 E [ (u 1 µ + u 2 µ) 2] = 1 4 E [ (u 1 µ) 2 + (u 2 µ) 2 + 2(u 1 µ)(u 2 µ) ] = 1 4 [Var(u 1) + Var(u 2 ) + E(u 1 µ)(u 2 µ)] = 1 4 [Var(u 1) + Var(u 2 )] This is because u 1, u 2 are independent Cov(u 1, u 2 ) = E(u 1 µ)(u 2 µ) = E(u 1 µ)e(u 2 µ) = 0

8 Similarly, for general n, ( Var(ū) = Var 1 n = 1 n n 2 ) n u t = E t=1 t=1 Var(u t ) + i j [ 1 n n (u t µ) t=1 ] 2 Cov(u i, u j ) = 1 n 2 n t=1 σ 2 u = σ2 u n

9 Mean of the sample average is the true value. E [ū] = µ Variance of the sample average converges to zero as sample size goes to infinity. Var(ū) 0 as n or lim Var(ū) = 0 n Which means that as the sample size increases, the sample average will be distributed more and more closely around the true value. Then, no matter how small, deviation of the sample mean from the true value becomes less and less likely. More formally, For any ɛ > 0, Pr ( ū µ < ɛ) 0 as n

10 or or lim Pr ( ū µ < ɛ) 0 n plim n ū = µ and in words, sample average converges to the true mean in probability. Theorem: (Weak) Law of Large Numbers Let u t, t = 1,..., n be independently and identically distributed with finite mean µ and variance σ 2 u. Then, the sample average ū converges to the true value µ in probability.

11 Consistency of OLS estimator. Suppose, to simplify the discusson, that X t is random but independent of u t. OLS estimator: ˆβ = [ 1 [ 1 X X] T X T y = β + X X] T X T u Now, divide both denominator and numerator by sample size n Now, let us assume that where S X T X is invertible. [ ] ˆβ = β + n XT X n XT u plim n 1 n XT X = S X T X

12 Then, it is know that we can write like this. [ ] plim n ˆβ = β + plim n n XT X plim n n XT u What is left for us to derive Now, = β + [S X T X ] 1 plim n 1 n XT u plim n 1 n XT u 1 n XT u = 1 n T x t u t Then, x t u t, t = 1,..., n are independently and identically distributed random varibles with finite mean E[x t u t ] = 0 and (we assume) finite variance Var(x t u t ) = σ 2 xu. Therefore, Law of Large Number holds and t=1

13 plim n 1 n Together, we have shown that T x t u t = µ xu = 0 t=1 [ ] plim n ˆβ = β + plim n n XT X plim n n XT u = β + [S X T X ] 1 plim n 1 n XT u = β Thus, the OLS estimator converges to the true parameter value in probability as sample size increases. We also say that the OLS estimator is consistent.

14 The (Variance) Covariance Matrix of OLS Estimator The (variance) covariance matrix of the OLS estimator given X is [ [ [ ] [ ] T Var ˆβ X] = E ˆβ E( ˆβ X) ˆβ E( ˆβ X)] X which is a k by k matrix, if X has k variables. The often reported standard error of the OLS parameter estimate is [ std.error( ˆβ i ) = Var ˆβ X] ii

15 Derivation of the Covariance Matrix Remember that OLS estimator is ˆβ = [ X T X] 1 X T y = β + and because OLS is unbiased, [ E ˆβ] = β [ X T X] 1 X T u Therefore, [ ˆβ E ˆβ] = [ X T X] 1 X T u and [ ˆβ E [ ]] [ [ ]] [ T [ ] ] [ 1 [ ] 1 T ˆβ ˆβ E ˆβ = X T X X T u X T X X u] T = [ 1 [ ] 1 X X] T X T uu T X X T X

16 Now, assume that the error term of the linear model satisfies the following Var(u X) = E[uu T X] = σ 2 I That is, because E[u t X] = 0 for any sample t and for s t Var (u t X) = E[uu T X] tt = σ 2 Cov (u s, u t X) = E[uu T X] st = 0 That is, the variance of the error terms are the same for each sample, and the error terms of two different samples are uncorrelated.

17 Then, Var( ˆβ X) = = [ 1 ( ) X X] T X T E uu T X [ 1 X X] T X T σ 2 IX [ ] 1 X X T X [ X T X] 1 = σ 2 [ X T X ] 1

18 Precision of Least Squares Estimators The smaller the variance of an OLS parameter ˆβ, we say the higher the precision of the parameter estimate. Therefore, we defined the precision matrix as the inverse of the Covariance matrix. Prec( ˆβ) = Var( ˆβ) 1 = σ 2 (X T X) First, we can see that the smaller the variance of the error term, the smaller the variance, i.e, larger the precision. Secondly, because N X T X = xt T x t usually the larger the sample size, the smaller the variance, i.e. the larger the precision. t=1

19 Now, we look at the variance of a single OLS parameter ˆβ 1. As we have seen from FWL Therorem, if we regress X 2 on both y and X and take residuals, i.e. premultiply them with M 2 = I X 2 [X T 2 X 2 ] 1 X T 2 and the variance of ˆβ 1 is ˆβ 1 = [x 1 M 2 x 1 ] 1 x 1 M 2 y Var( ˆβ 1 ) = σ 2 [x 1 M 2 x 1 ] 1 = σ 2 x T 1 M 2x 1 If x 1 can be perfectly explained by the rest of X, i.e, X 2, then M 2 x 1 = 0 and therefore, OLS estimator ˆβ 1 is not defined, or equivalently, its variance is infinite and precision zero. This is what is called Munticollinearity problem. The smaller the sum of squares of the residual M 2 x 1, the larger the variance, and thus the smaller the precision.

20 Linear Functions of Parameter Estimates Suppose that the objective of the OLS regression was to obtain the parameter estimates so that we can obtain predictor ŷ p given the parameter estimate ˆβ and specific value of x p. Then, ŷ p = x p ˆβ Notice that because OLS estimator is unbiased, and the mean of the error term is zero, then predictor is also unbiased. That is, [ E [ŷ p x p ] = x p E ˆβ] = x p β = E [x p β + u p x p ] = E [y p x p ] and the variance of the predictor is [ [ [ ] ] T Var [ŷ p x p ] = E x p ˆβ x p β] x p ˆβ x p β xp [ [ ] [ ] T = x p E ˆβ β ˆβ β] x T p x p = x p Var( ˆβ)x p T

21 The forecast error is y p ŷ p = x p β + u p x p ˆβ = up + x p [β ˆβ ] Because the error term u p and X and x p are assumed to be uncorrelated, if we assume that u p is uncorrelated with u t, t = 1,..., n, then u p and ˆβ are also uncorrelated. Therefore, Var(y p ŷ p ) = Varu p + x p Var( ˆβ)x T p Cov(u p, x p ˆβ) = Var(u p ) + x p Var( ˆβ)x T p = σ 2 + σ 2 x p (X T X) 1 x T p

22 The same as we have discussed, we can derive standard errors of any linear combination of the OLS estimate. ω T ˆβ as follows. Var(ω T ˆβ) = E [ ] [ ] T ω T ˆβ ωβ ω T ˆβ ω T β = Eω T [ ˆβ β ] [ ˆβ β ] T ω = ω T Var( ˆβ)ω = ω T σ 2 [ X T X] 1 ω

23 Efficiency of the OLS Estimator In this section, we will conclude that the OLS is the most efficient estimator among all linear unbiased estimators. That is, roughly speaking, OLS has the smallest variance. But how do we define the variance covariance matrix A to be smaller than the other variance covariance matrix B? A nice way to define smaller is as follows. : A symmetric matrix A is smaller than the other symmetric matrix B if, for any vector ω, ω T [B A] ω 0 This is equivalent to saying that the symmetric matrix B A is positive semidefinite. A symmetric matrix A is positive definite if for any vector ω ω T Aω = k k ω i ω j A ij > 0 i=1 j=1

24 Notice that the variance covariance matrix is always symmetric and positive semidefinite. Suppose y t to be a (row) vector of random variable. Then, (y t E[y t ]) (y t E[y t ]) T is positive definite, because for any (row) vector ω ω T (y t E[y t ]) (y t E[y t ]) T ω = [ ω T (y t E[y t ])] 2 0 By taking expectations, [ ] E ω T (y t E[y t ]) (y t E[y t ]) T ω = ω T Var(y t )ω 0

25 Gauss-Markov Theorem Assume that E [u X] = 0 and E [ uu T X ] = σi in the linear regression model. Then, the OLS estimator is more efficient than any other linear unbiased estimator β, i.e. is a positive semidefinite matrix. Var( β) Var( ˆβ)

26 Proof Consider an arbitrary linear estimator, which is β = Ay Now, denote β = ˆβ + β ˆβ Then, Var( β X) = Var( ˆβ X) + Var( ˆβ β X) + 2Cov( ˆβ, ˆβ β X) What we will show below is that indeed Cov( ˆβ, ˆβ β X) = 0 and therefore, Var( β X) = Var( ˆβ X) + Var( ˆβ β X) Var( ˆβ X)

27 The OLS estimator is one of the linear estimator with [ ] 1 A = X T X X T Because Ay = A [Xβ + u] E [Ay X ] = AXβ + AE [u X ] = AXβ

28 In order for the unbiasedness to hold, i.e. for any true value β AX = I = E [Ay X ] = β [ X T X] 1 X T X has to hold. So, [ [ ] ] 1 A X T X X T X = 0 Now, [ β ˆβ E β ˆβ X] [ = A(Xβ + u) β ] ] 1 X T = [ A [ X T X X T X] 1 X T u u

29 Because it is assumed that Var(u X) = σ 2 I ( ) Cov β ˆβ, ˆβ X ([ = E A = E [ ] ] 1 [ X T X X T E(uu X)X ] X T X ) ([ [ ] ] 1 [ ] A X T X X T σ 2 IX X T X X [ ] ] 1 [ ] = σ [A 2 X T X X T X X T X = 0 ) X

30 Residuals and Error Terms Now, consider the residuals of the OLS ( [ ] ) 1 û = y X ˆβ = I X X T X X T y = M X y = M X [Xβ + u] = M X u Then, because E [M X u X] = 0 [ ] Var (û X) = E ûû T X = M X E [ σ 2 I ] M X = σ 2 M X Which is different from Var (u) = σ 2 I In fact, Var (u) Var (û) = σ 2 [I M X ] = σ 2 P X

31 Notice that for any vector y y T P X y = y T P X P X y = (X ˆβ) T (X ˆβ) 0 So, P T is positive semidefinite. Therefore, in matrix sense, Var (û) is smaller than Var (u). That is, the OLS overfits the data, i.e. makes the residual having smaller variance than the error term. As we have seen the variance matrix of OLS estimator is ( ) [ ] 1 Var ˆβ = σ 2 X T X We need to derive an estimate of σ 2 = Var(u t ). A potential estimator is the sample variance of the residual, which is 1 n n [ût û ] 2 t=1 As long as constant term ı is included in X, which is usually the case, û = 1 n ıû = 0

32 Therefore, Varû t = [ σ 2 M X ] tt = diag ( σ 2 M X )t where diag(a) is the vector containing the diagonal elements of the n by n matrix A. Hence, n Varû t = t=1 n [ σ 2 M X ]tt = trace ( σ 2 ) M X t=1 where trace(a) is the sum of all the diagonal elements,i.e., trace(a) = n i=1 A ii

33 Notice that trace(a + B) = trace(a) + trace(b) This is because Also, This is because trace(ab) = = [A + B] ii = A ii + B ii. trace(ab) = trace(ba) n n n [AB] ii = A ij B ji = i=1 i=1 j=1 n [BA] jj = trace(ba) j=1 n n B ji A ij j=1 i=1

34 Therefore, [ ( ) ]) 1 trace(σ 2 M X ) = trace (σ 2 I X X T X X T ( [ ] ) 1 = σ 2 trace(i) σ 2 trace X X T X X T ( [ ) 1 = σ 2 n σ 2 trace X X] T X T X = σ 2 (n k) Therefore, [ n E t=1 û 2 t ] = n Var(û t ) = σ 2 (n k) < σ 2 n = t=1 n Var(u t ) t=1

35 So, the unbiased estimate of the variance σ 2 is s 2 = SSR n k = 1 n k n t=1 Together, the estimate of the variance matrix of the OLS coefficient is Var( ˆβ) ( ) 1 = s 2 X T X û 2 t

36 Misspecification of Linear Regression Models In most situations, we do not a priori know what is the true regression model, i.e. which variable should belong to the regression equation. Overspecification Suppose we put in more variables on the RHS as needed. y = Xβ + Zγ + u, u IID(0, σ 2 I) where E(u [X, Z]) = 0 but Z is redundant, i.e. γ = 0 Then, [ 1 [ 1 ˆβ = X T M Z X] X T M Z y = X T M Z X] X T M Z (Xβ + u) [ 1 = β + X T M Z X] X T M Z u

37 Because Therfore, E [u X, Z] = 0, ( [ ) 1 E X T M Z X] X T M Z u [X, Z] = 0 [ E ˆβ X, Z] = 0 and thus, the coefficient ˆβ is unbiased.

38 Variance ( ) ( ) 1 Var ˆβ X, Z = σ 2 X T M Z X Now, we know that X T M Z X = X T X X T P Z X Hence, X T X X T M Z X = X T P Z X which is positive semidefinite. So, X T M Z X X T X Therefore, ( σ 2 X X) ( 1 ) 1 T σ 2 X T M Z X That is, if we include unnecessary variables, as longa s the assumption E [u X, Z] are satisfied, we have unbiasedness but we have an estimator with larger variance, i.e. we lose on efficiency.

39 Underspecification What if we the true specification is y = Xβ + Zγ + u but we leave out Z? As we have discussed, the OLS estimate is [ 1 ˆβ = X X] T X T (Xβ + Zγ + u) [ 1 = β + X X] T X T (Zγ + u) Given the assumption of E [u X, Z] = 0, [ E ˆβ X, Z] [ 1 = β + X X] T X T Zγ as long as X T Z 0 and γ 0, we have omitted variable bias.

40 Now, ˆβ β = [ 1 [ 1 X X] T X T Zγ + X X] T X T u and ( ˆβ β)( ˆβ β) T = [ 1 [ X X] T X T Zγγ T Z T X X T X [ 1 [ ] 1 + X X] T X T uu T X X T X [ 1 [ + X X] T X T Zγu T X X T X [ 1 [ + X X] T X T uγ T Z T X X T X ] 1 ] 1 ] 1 Because E [ ] uu T [X, Z] = σ 2 I, E [u [X, Z]] = 0

41 It is not clear which OLS estimator has less variance. If γ is small, then it could be that omitting an unimportant variable would result in bias but improve in MSE. But with omittev variables, variance covariance matrix gives the wrong message on the accuracy and reliability of the OLS estimate. With larger sample size, bias dominates, thus the problem with underspecification becomes more severe. If we take conditional expectations given X, Z, we obtain [ ] MSE( ˆβ o ) = E ( ˆβ o β)( ˆβ o β) T [X, Z] ( 1 ( = X X) T X T Zγγ T Z T X X T X +σ 2 ( X T X ) 1 ) 1 Mean square error is the variation of an estimator around its true value. What if you included the variables Z into the regression? Then, ˆβ is unbiased, and thus, MSE( ˆβ) = Var( ˆβ) = σ 2 ( X T M Z X ) 1

42 Measures of Goodness of Fit The simple R 2 is R 2 = 1 SSR TSS = 1 n t=1 ût 2 n t=1 (y t ȳ) 2 If we look at the numerator: The more independent variables we have, the less will be the RSS. n ûkt 2 = min (β1,...,β k ) t=1 min (β1,...,β k+1 ) n y t t=1 k x jt β j j=1 n k+1 y t x jt β j t=1 j=1 2 2 = n t=1 û 2 k+1,t

43 The more independent variables you include (higher k), the smaller becomes the RSS (residual sum of squres). If k = n, then RSS becomes zero. So, you can increase the R 2 by simply putting more and more stuff as the independent variables. Hence, R 2 won t be a good indication of the appropriateness of the linear model. One needs to fine a measure of goodness of fit that penalizes having many independent variables.

44 For the numerator: instead of SSR, use unbiased estimator for Var(u t ) = σ 2, s 2. For the denomintaor, instead of TSS, use the unbiased estimator for the Var(y) R 2 = 1 1 n n k t=1 û2 i 1 n 1 n t=1 (y i ȳ) 2 = 1 (n 1) n t=1 û2 i (n k) n t=1 (y i ȳ) 2 This is called adjusted R 2. Given the same SSR, having more independent variables (higher k) reduces the R 2. So, it puts penalties on the high number of regressors. Notice that for very poorly fit models, adjusted R 2 can be negative.

45 Hypothesis Testing in Linear Regression Models Suppose you have the following linear model: log(wage) = β 0 + β 1 educ + β 2 exp + β 3 ten + u educ: education, exp: experience, ten: tenure on the job. Suppose you obtained the following regression result. variables estimate std. error constant education experience tenure sample size 526 R

46 Suppose you want to know whether the return to education is zero or not. That is, you want to test the hypothesis that the return to education is 0, when the estimated return is with the standard error If the estimated value is very different from zero, then you should reject. But what is the proper way to measure whether is very different from zero or not? Then, the variance of the OLS becomes important as well. If the OLS is accurately estimated, then, even if it is close to zero, one should reject the hypothesis of zero returns to education. In this case the standard error is 0.007, fairly accurate.

47 A potential candidate of would be, for the hypothesis of β j = β 0 ˆβ j β 0 st.dev( ˆβ j ) If, relative to the accuracy of the OLS estimate, ˆβ j, is too far away from the hypothesis β 0, then we reject the hypothesis. Suppose the OLS estimator is normally distributed with mean β 0 ˆβ j N(β 0, σ βj ) Then, z = ˆβ j β 0 Var( ˆβ N(0, 1) j ) 1/2 Then, one can set up a rejection region with R βj 0 such that reject if ˆβ j β 0 Var( ˆβ j ) 1/2 R β

48 Then, we can make R β such that P( z R β ) = 0.05 Then, even if the hypothesis is true, i.e. β = β 0, the hypothesis will be rejected, i.e. ˆβ j β 0 Var( ˆβ j ) 1/2 R β with probability In this case, the probability of Type 1 error (rejecting a true hypothesis) is In other words, this test has a significance level of Or, the power of the test is 0.05.

49 Type II error: Suppose the null hypothesis β = β 0 is not true. Then, mistakenly not rejecting the false null hypothesis is type II error, which equals 1 - power. R β is called critical value, and is often denoted as c α, where α is the power of the test. For example, if z N(0, 1), the critical value c α for α = 0.05 is C 0.05 = 1.96 i.e. P (z 1.96 z 1.96) = 0.05 or P ( 1.96 z 1.96) = = 0.95 Notice that because this is two-tailed test P (z 1.96) = 0.025, P (z 1.96) = That is, if Φ() is the (cumulative) distribution function of N(0, 1), Φ(c α ) = 1 α/2

50

51 P values Suppose in the above example, z = ˆβ j β 0 = 1.96 Var( ˆβ j ) 1/2 Then, the P-value is That is, you barely fail to reject the hypothesis at significance level If the P-value is 0.05, can you reject the hypothesis at significance leve of 0.1? The critical value for 0.1, C 0.1 is Then, because 1.96 > 1.645, the hypothesis is rejected. What about significance level of 0.025? c = 2.24 > Hence, the hypothesis cannot be rejected. P-value (marginal significance): greatest significance level for which the hypothesis cannot be rejected.

52 In general, P-value for 2 tailed test for ẑ is Prob( z ẑ ) then, if the distribution of z is standard normal, p(ẑ) = 2(1 Φ( ẑ )) or Φ( ẑ ) = 1 α/2 P-values preserves all the information from the estimation.

53 Normal Distribution The density function of standard normal distribution (with mean 0, variance 1) u is f (u) = 1 ( ) u 2 exp 2π 2 Consider now x, the normal random variable with mean µ and variance σ 2. Then, u = x µ σ Therefore, the density of x, g(x) is g(x)dx = g(u) du dx dx = 1 2π exp = 1 σ 2π exp ( ) (x µ)2 2σ 2 ( ( x µ σ 2 ) 2 ) 1 σ

54 Joint Distribution The joint distribution of independent normal random variables x 1 N(0, 1), x 2 N(0, 1 2 ) is ( 1 f (x) = f (x 1 )f (x 2 ) = ( 2π) exp x x ( ) = det(i )( 2π) exp x T x/2 2 ) where x = [ x1 x 2 ]

55 Now, consider a vector y which is normally distributed with mean µ and variance matrix Σ. Let A be the matrix that satisfies AA T = Σ. Then, y can be expressed as because, then also, E(y) = µ, Var(y) = E y = µ + Ax ( [y µ][y µ] T ) = AIA T = Σ x = A 1 (y µ), dx = det(a 1 )dy = det(σ) 1/2 dy

56 Together, we obtain the joint normal distribution y as g(y)dy = f ( A 1 (y µ) ) dx dy dy = 1 det(σ) 1/2 (2 π) 2 exp ( 1 ) 2 (y µ)t Σ 1 (y µ) dy Independence Suppose y 1, y 2 are jointly normally distributed but not correlated, i.e. Σ 12 = Σ 21 = 0. Then, y 1, y 2 are independent. The opposite is also true. This is because in this case, (y µ) T Σ 1 (y µ) = (y 1 µ 1 ) 2 Σ 11 + (y 2 µ 2 ) 2 Σ 22 and det(σ) = Σ 11 Σ 22

57 Therefore, g(y) = 1 Σ11 2π exp ( (y 1 µ 1 ) 2 ) 2Σ 11 1 exp ( (y 2 µ 2 ) 2 ) Σ22 2π = g 1 (y 1 )g 2 (y 2 ) 2Σ 22 Therefore, y 1, y 2 are independent to each other. We next show that the sum of two normal distribution is also normal. Consider the sum of 2 independent standard normally distributed random variables, y = x 1 + x 2. Then, we can express x 2 = y x 1. Therefore,

58 [ ] d(x1, x 2 ) f (x)dx = g(x 1, y)det d(x 1, y) d(x 1, y) 1 = ( ( 2π) exp x (y x 1) 2 ) d(x 2 1, y) 2 1 = ( ( 2π) exp 2(x ) 1 y 2 )2 + y 2 2 d(x 2 1, y) 2 = 1 2π 2 exp ( y ) 1 2π 1/2 exp ( (x 1 y 2 ) Hence we can see that conditional on y, x 1 has mean y/2 with variance 1/2, and integrating over x 1, we can get the distribution of y having the following functional form. 1 2π exp ( y 2 ) that is, y is normally distributed with mean 0 and variance 2. 4 )

59 Chi Squared Distribution Suppose z is a vector of m independently and indentically distributed standard normal distributions. Then, y = z T z = m t=1 is chi-squared distributed with m degrees of freedom; i.e. y χ 2 (m) Because of this, we can see that if y 1 χ 2 (m 1 ) and y 2 χ 2 (m 2 ), and if y 1 and y 2 are independently distributed, then y 1 + y 2 is just like a sum of squared of m 1 + m 2 independently distributed standard normal distribution, so z 2 t y 1 + y 2 χ 2 (m 1 + m 2 )

60 Quadratic Form and chi-square distribution 1. If m-vector x N(0, Ω), then x T Ω 1 x χ 2 (m) Let A be such that AA T = Ω. Then, [ ] Var(A 1 x) = E A 1 xx T A 1T = A 1 ΩA 1T = A 1 AA T A 1T = I m Therefore, and therefore, A 1 x N(0, I m ) ( A 1 x ) T A 1 x = x T Ω 1 x χ 2 (m)

61 2. If P is a projection matrix with rank r and z N(0, I n ), then z T Pz χ(r) Because P is the projection matrix, there exists a n by r matrix X such that [ ] 1 P = X X T X X T Now, X T z is r by 1 vector, and X T z N(0, X T X) Therefore, from 1, [ 1 z T X X X] T X T z χ 2 (r)

62 Student s t distribution If z N(0, 1) and y χ 2 (m) and z and y are independent, then t = z ( y m) 1/2 has a Student t distribution with m degrees of freedom. F distribution If y 1 χ 2 (m 1 ) and y 2 χ 2 (m 2 ) and they are independent, then F = y 1/m 1 y 2 /m 2 has a F distribution F (m 1, m 2 ) with degrees of freedom m 1, m 2.

63 Exact Test in the Classical Normal Linear Model y = Xβ + u Additional assumption: u is statistically independent and normally distributed, u N(0, σ 2 I ). Test of a Single Restriction y = X 1 β 1 + X 2 β 2 + u, u N(0, σ 2 I) Test of the hypothesis β 2 = β 2 OLS Estimation: M X1 y = M X1 X 2 β 2 + M X1 u ˆβ 2 = [ X T 2 M X1 X 2 ] 1 X T 2 M X1 y Var( ˆβ 2 ) = σ 2 ( X T 2 M X1 X 2 ) 1

64 Then, if the null hypothesis is true, then the true parameter is β 2 = β2, and then, the OLS coefficient is normally distributed with mean β2, and variance Var( ˆβ ( ) 1 2 ) = σ 2 X2 T M X1 X 2 Then, we know that [ ˆβ 2 E ˆβ] Var( ˆβ 2 ) N(0, 1) Therefore, ˆβ 2 β 2 σ ( X T 2 M X 1 X 2 ) 1/2 N(0, 1)

65 The problem is that we do not know σ 2. To deal with this, we use s 2, the unbiased estimate of σ 2, i.e. s 2 = 1 n k n t=1 û 2 t = y T M X y n k But the problem is that if you just stick s 2 into the place where σ 2 used to be, then the resulting distribution is not normal any more, because s 2 is not fixed, it is random. Next, we will deal with the additional randomness.

66 Now, given X being a rank k n by k matrix, consider a n by n k matrix Z of rank n k such that X T Z = 0 Then, because [X, Z] is a n by n matrix with full rank, [ ] β y = [X, Z] = X ˆβ + Zˆγ γ has a solution. Notice that because of orthogonality, X T Zˆγ = 0

67 Therefore, Zˆγ is the residual û of OLS where X is the independent variable, i.e. [ 1 û = Zˆγ = Z Z Z] T Z T y = Z [ Z T Z] 1 Z T (Xβ + u) = Z [ Z T Z] 1 Z T u Then, because u/σ N(0, I ), [ 1 SSR = u T Z Z Z] T Z T u χ 2 (n k) Then, because z = ˆβ 2 β 2 σ ( X T 2 M X 1 X 2 ) 1/2 N(0, 1), SSR σ 2 χ 2 (n k) and if z and SSR are independent. z SSR σ 2 (n k) t(m)

68 We next show that z and SSR are independent. First, we show that ˆβ and û are uncorrelated. This is because ( ) [ T ( ) ] 1 ( ) 1 û ˆβ β = I X X T X X T uu T X X T X Hence, [ ( ) ] T E û ˆβ β X ( ) ] 1 ( ) 1 = σ [I 2 X X T X X T X X T X ( ) 1 ( ) ] 1 = σ [X 2 X T X X X T X = 0 Because both z and û are normally distributed, since they are uncorrelated with each other, they are independently distributed.

69 Therefore, ˆβ 2 β 2 (X T 2 M X 1 X 2) 1/2 s 2 = ˆβ 2 β 2 (s 2 [ X T 2 M X 1 X 2 ] 1 ) 1/2 t(m) where s 2 [ X T 2 M X1 X 2 ] 1 is the estimated variance of ˆβ 2.

70 Tests of Several Restrictions Suppose that X 1 is a n by k 1 matrix and X 2 is a n by k 2 matrix, β 1 is k 1 by 1 vector and β 2 is k 2 > 1 by 1 vector. If we want to test β 2 = 0, then, the above t test does not work. So, instead, we use the F-distribution based F-test. The null hypothesis is H 0 : y = X 1 β 1 + u, u N(0, σ 2 I) and the alternative hypothesis is H 1 : y = X 1 β 1 + X 2 β 2 + u, u N(0, σ 2 I)

71 Instead, use the chi squared distribution of the residuals. The residual if null hypothesis were true, û T r û r σ 2 χ 2 (n k 1 ) The residual for the alternative hypothesis: û T u û u σ 2 χ 2 (n k 1 k 2 ) Because of this, we are considering using F-test, which involves two chi-squared distributions. But for the F-test, those two distributions need to be independent. But û r and û u are not independent to each other, thus neither are û T r û r and û T u û u. So, we cannot use the above sum of squares of the residuals directly.

72 But we will show below that actually û u and û r û u are uncorrelated, thus independent because they are normally distributed. Therefore, we can do the F-test using and and û T u û u σ 2 χ 2 (n k 1 k 2 ) (û r û u ) T (û r û u ) σ 2 χ 2 (k 2 ) F = (û r û u ) T (û r û u ) /k 2 û T u û u /(n k 1 k 2 ) F (k 2, n k 1 k 2 )

73 Then, restricted sum of squares of residuals (RSSR), i.e. β 2 = 0 is RSSR = y T M X1 y and without restriction, û r = M X1 y = M X1 X 2 ˆβ 2 + M X1 û u = M X1 X 2 ˆβ 2 + û u Next, we show that M X1 X 2 ˆβ 2 and û u are uncorrelated. ( ) ] E [M X1 X 2 ˆβ 2 β 2 ûu T ) ] 1 = E [M X1 X 2 (X T2 M X1 X 2 X T2 M X1 uu T M X ) 1 = σ 2 M X1 X 2 (X T 2 M X1 X 2 X T 2 M X1 M X = 0

74 Therefore, û r û u = M X1 X 2 ˆβ 2 and û u are uncorrelated, and thus independent to each other. Furthermore, û r û u = M X1 X 2 (X T 2 M X1 X T 2 ) 1 X T 2 M X1 y = β 2 + P MX1 X 2 u Therfore, given the hypothesis of β 2 = 0 because P MX1 X 2 projection matrix, is a and finally, Therefore, (û r û u ) T (û r û u ) σ 2 = ut P MX1 X 2 u σ 2 χ 2 (k 2 ) (û r û u ) T û u = ˆβ T 2 X T 2 M X1 û u = 0 (û r û u ) T (û r û u ) = (û r û u ) T û r = û T r û r û T u (û u + (û r û u )) = û T r û r û T u û u

75 Together, we have shown that F = (ût r û r û T u û u ) /k2 û T u û u /(n k 1 k 2 ) = (RSSR USSR) /k 2 USSR/(n k 1 k 2 ) F (k 2, n k 1 k 2 )

76 Test of General Linear Restrictions. All linear restrictions on the parameters can be expressed as Rβ = 0 where β is k 1 vector, R is r k matrix of rank r (consisting of r linearly independent vectors), and r is the number of restrictions. For example, the restriction that is β = 0 Rβ = 0 where R = I k. The restriction that β 1 = 0,..., β k1 = 0 can be expresses similarly with R = [I k1, 0]

77 Then, as we have seen before, if we assume that u N ( 0, σ 2 I ), then, ˆβ N (β 0, σ 2 (X T X) 1) Therefore, Var E [ R ˆβ X ] = Rβ ( R ˆβ X ) ( = R Var( ˆβ) ) ( ) 1 R T = σ 2 R X T X R T Therefore, as we have seen before, ( R ˆβ ) T [ Rβ R ( X T X ) ] 1 1 ( R T R ˆβ ) Rβ σ 2 χ 2 (r) The remaining thing to do is to substitute s 2 for σ 2, and then, the statistics will be F-distributed.

78 As before, we need to show that s 2 and R ˆβ are independent with each other. We first show that û and ˆβ are independent. They are both normally distributed and E [ ( ) ] T û ˆβ β X ( ) ] 1 ( 1 = σ [I 2 X X T X X T X X X) T = 0 Since they are both normally distributed, and uncorrelated, they are independent. Therefore, s 2 and ˆβ are also independent, and thus, s 2 and R ˆβ are indepedent. Therefore, ( R ˆβ Rβ ) T [ R ( X T X ) 1 R T ] 1 ( R ˆβ Rβ ) /r s 2 F (r, n k)

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε,

EC3062 ECONOMETRICS. THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation. (1) y = β 0 + β 1 x β k x k + ε, THE MULTIPLE REGRESSION MODEL Consider T realisations of the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, which can be written in the following form: (2) y 1 y 2.. y T = 1 x 11... x 1k 1

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Multivariate Regression Analysis

Multivariate Regression Analysis Matrices and vectors The model from the sample is: Y = Xβ +u with n individuals, l response variable, k regressors Y is a n 1 vector or a n l matrix with the notation Y T = (y 1,y 2,...,y n ) 1 x 11 x

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

The Linear Regression Model

The Linear Regression Model The Linear Regression Model Carlo Favero Favero () The Linear Regression Model 1 / 67 OLS To illustrate how estimation can be performed to derive conditional expectations, consider the following general

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Asymptotic Theory. L. Magee revised January 21, 2013

Asymptotic Theory. L. Magee revised January 21, 2013 Asymptotic Theory L. Magee revised January 21, 2013 1 Convergence 1.1 Definitions Let a n to refer to a random variable that is a function of n random variables. Convergence in Probability The scalar a

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

Instrumental Variables

Instrumental Variables Università di Pavia 2010 Instrumental Variables Eduardo Rossi Exogeneity Exogeneity Assumption: the explanatory variables which form the columns of X are exogenous. It implies that any randomness in the

More information

INTRODUCTORY ECONOMETRICS

INTRODUCTORY ECONOMETRICS INTRODUCTORY ECONOMETRICS Lesson 2b Dr Javier Fernández etpfemaj@ehu.es Dpt. of Econometrics & Statistics UPV EHU c J Fernández (EA3-UPV/EHU), February 21, 2009 Introductory Econometrics - p. 1/192 GLRM:

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012

Econometric Methods. Prediction / Violation of A-Assumptions. Burcu Erdogan. Universität Trier WS 2011/2012 Econometric Methods Prediction / Violation of A-Assumptions Burcu Erdogan Universität Trier WS 2011/2012 (Universität Trier) Econometric Methods 30.11.2011 1 / 42 Moving on to... 1 Prediction 2 Violation

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna November 23, 2013 Outline Introduction

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Heteroskedasticity and Autocorrelation

Heteroskedasticity and Autocorrelation Lesson 7 Heteroskedasticity and Autocorrelation Pilar González and Susan Orbe Dpt. Applied Economics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 7. Heteroskedasticity

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Classical regression model b)

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T, Regression Analysis The multiple linear regression model with k explanatory variables assumes that the tth observation of the dependent or endogenous variable y t is described by the linear relationship

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

ANALYSIS OF VARIANCE AND QUADRATIC FORMS

ANALYSIS OF VARIANCE AND QUADRATIC FORMS 4 ANALYSIS OF VARIANCE AND QUADRATIC FORMS The previous chapter developed the regression results involving linear functions of the dependent variable, β, Ŷ, and e. All were shown to be normally distributed

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A )

Econ 620. Matrix Differentiation. Let a and x are (k 1) vectors and A is an (k k) matrix. ) x. (a x) = a. x = a (x Ax) =(A + A (x Ax) x x =(A + A ) Econ 60 Matrix Differentiation Let a and x are k vectors and A is an k k matrix. a x a x = a = a x Ax =A + A x Ax x =A + A x Ax = xx A We don t want to prove the claim rigorously. But a x = k a i x i i=

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N

Econ 836 Final Exam. 2 w N 2 u N 2. 2 v N 1) [4 points] Let Econ 836 Final Exam Y Xβ+ ε, X w+ u, w N w~ N(, σi ), u N u~ N(, σi ), ε N ε~ Nu ( γσ, I ), where X is a just one column. Let denote the OLS estimator, and define residuals e as e Y X.

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

ECON Introductory Econometrics. Lecture 16: Instrumental variables

ECON Introductory Econometrics. Lecture 16: Instrumental variables ECON4150 - Introductory Econometrics Lecture 16: Instrumental variables Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 12 Lecture outline 2 OLS assumptions and when they are violated Instrumental

More information

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017 Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand

More information

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1

coefficients n 2 are the residuals obtained when we estimate the regression on y equals the (simple regression) estimated effect of the part of x 1 Review - Interpreting the Regression If we estimate: It can be shown that: where ˆ1 r i coefficients β ˆ+ βˆ x+ βˆ ˆ= 0 1 1 2x2 y ˆβ n n 2 1 = rˆ i1yi rˆ i1 i= 1 i= 1 xˆ are the residuals obtained when

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Econometrics - 30C00200

Econometrics - 30C00200 Econometrics - 30C00200 Lecture 11: Heteroskedasticity Antti Saastamoinen VATT Institute for Economic Research Fall 2015 30C00200 Lecture 11: Heteroskedasticity 12.10.2015 Aalto University School of Business

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Empirical Economic Research, Part II

Empirical Economic Research, Part II Based on the text book by Ramanathan: Introductory Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 7, 2011 Outline Introduction

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

1 Appendix A: Matrix Algebra

1 Appendix A: Matrix Algebra Appendix A: Matrix Algebra. Definitions Matrix A =[ ]=[A] Symmetric matrix: = for all and Diagonal matrix: 6=0if = but =0if 6= Scalar matrix: the diagonal matrix of = Identity matrix: the scalar matrix

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

Heteroskedasticity. Part VII. Heteroskedasticity

Heteroskedasticity. Part VII. Heteroskedasticity Part VII Heteroskedasticity As of Oct 15, 2015 1 Heteroskedasticity Consequences Heteroskedasticity-robust inference Testing for Heteroskedasticity Weighted Least Squares (WLS) Feasible generalized Least

More information

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17

Estimating Estimable Functions of β. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 17 Estimating Estimable Functions of β Copyright c 202 Dan Nettleton (Iowa State University) Statistics 5 / 7 The Response Depends on β Only through Xβ In the Gauss-Markov or Normal Theory Gauss-Markov Linear

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

11 Hypothesis Testing

11 Hypothesis Testing 28 11 Hypothesis Testing 111 Introduction Suppose we want to test the hypothesis: H : A q p β p 1 q 1 In terms of the rows of A this can be written as a 1 a q β, ie a i β for each row of A (here a i denotes

More information

Econometrics Multiple Regression Analysis: Heteroskedasticity

Econometrics Multiple Regression Analysis: Heteroskedasticity Econometrics Multiple Regression Analysis: João Valle e Azevedo Faculdade de Economia Universidade Nova de Lisboa Spring Semester João Valle e Azevedo (FEUNL) Econometrics Lisbon, April 2011 1 / 19 Properties

More information

Introductory Econometrics

Introductory Econometrics Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 11, 2012 Outline Heteroskedasticity

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality. 88 Chapter 5 Distribution Theory In this chapter, we summarize the distributions related to the normal distribution that occur in linear models. Before turning to this general problem that assumes normal

More information

Lecture 11: Regression Methods I (Linear Regression)

Lecture 11: Regression Methods I (Linear Regression) Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical

More information

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. Sample Problems 1. True or False Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them. (a) The sample average of estimated residuals

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

1. The Multivariate Classical Linear Regression Model

1. The Multivariate Classical Linear Regression Model Business School, Brunel University MSc. EC550/5509 Modelling Financial Decisions and Markets/Introduction to Quantitative Methods Prof. Menelaos Karanasos (Room SS69, Tel. 08956584) Lecture Notes 5. The

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

Motivation for multiple regression

Motivation for multiple regression Motivation for multiple regression 1. Simple regression puts all factors other than X in u, and treats them as unobserved. Effectively the simple regression does not account for other factors. 2. The slope

More information

Advanced Quantitative Methods: ordinary least squares

Advanced Quantitative Methods: ordinary least squares Advanced Quantitative Methods: Ordinary Least Squares University College Dublin 31 January 2012 1 2 3 4 5 Terminology y is the dependent variable referred to also (by Greene) as a regressand X are the

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = β 0 + β 1 x 1 + β 2 x 2 +... β k x k + u 2. Inference 0 Assumptions of the Classical Linear Model (CLM)! So far, we know: 1. The mean and variance of the OLS estimators

More information

Financial Econometrics

Financial Econometrics Material : solution Class : Teacher(s) : zacharias psaradakis, marian vavra Example 1.1: Consider the linear regression model y Xβ + u, (1) where y is a (n 1) vector of observations on the dependent variable,

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han

Econometrics Honor s Exam Review Session. Spring 2012 Eunice Han Econometrics Honor s Exam Review Session Spring 2012 Eunice Han Topics 1. OLS The Assumptions Omitted Variable Bias Conditional Mean Independence Hypothesis Testing and Confidence Intervals Homoskedasticity

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information

Quick Review on Linear Multiple Regression

Quick Review on Linear Multiple Regression Quick Review on Linear Multiple Regression Mei-Yuan Chen Department of Finance National Chung Hsing University March 6, 2007 Introduction for Conditional Mean Modeling Suppose random variables Y, X 1,

More information